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APPENDIX  I  -  PARALLEL  EXECUTION  OF 


THE  DYNAMIC  PROGRAMMING  TECHNIQUE 


1 .  DYNAMIC  PROGRAMMING 

Dynamic  programming  is  a  mathematical  technique  devised  by  Bellman 
for  maximizing  a  function  of  n  variables: 

n 

R  (x,,  x, . x.  )  -  /  g.(x.)  ,  (!-■ 

n'  1  2  n  1  1 

i  =  1 

where 

g.(0)  =  0  and 
®i<Xi )  ~  ° 

over  the  region 


n 


i  =  1 


The  dynamic  programming  technique  is  directly  applicable  to  allocation 
problems. 

Consider  the  x  of  Equation  1-2  to  be  a  resource  that  is  to  be  allocated  to 
some  n  activities.  Let  denote  the  allocation  to  activity  t,  and  g^(x.)  the 
resultant  return  from  activity  i.  Then  the  total  return  from  all  n  activities 
may  be  expressed  by  Equation  I-l.  The  problem  is  to  determine  an  op¬ 
timal  policy  of  allocation,  that  is,  to  maximise  Equation  I-l  and 


& 

Bellman,  R  E.  ,  and  Dreyfus,  S.  E.  :  Applied  Dynamic  Programming. 
Princeton,  N.  J.  ,  Princeton  University  Press,  1962. 
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determine  the  allocation*  by  which  the  maximisation  ie  effected.  The  dy¬ 
namic  programming  solution  to  the  maximization  problem  reate  on  the 
"discretisation1*  of  the  range  Q),  x^and  the  application  of  Bellman's  prin¬ 
ciple  of  optimality,  which  may  be  stated:  "an  optimal  policy  has-the 
property  that  whatever  the  initial  state  and  initial  decision  are,  the  re¬ 
maining  decision  must  constitute  an  optimal  policy  with  regard  to  the 
state  resulting  from  the  first  decision. 

In  the  execution  of  the  dynamic  programming  technique,  the  following 
sequence  is  constructed: 

fj(x),  f2(x),  .  .  . ,  fn(x)  (I“3) 


where 


yx)  3  s^?x)[vxi*  x2 . xk)  <I‘4> 

with 

RjJXj,  x2 . XjJ  and  Sk(x)  as  defined  in  Equations  1-1  and  1-2. 

Making  the  reasonable  definition, 

»0(x)  =0  (1*5) 

and  noting  that  f^x)  *  gt(x),  the  following  recursive  relation  can  be  de¬ 
duced^ 

v*’  ■  o  « jyv  *  *k  •  i<x  •  v]  ,I't> 

thus  establishing  an  inductive  method  for  determining  the  sequence  (1-3). 
Equation  1-6  is  just  the  mathematical  expression  for  the  principle  of 
optimality;  it  allows  the  reduction  of  the  problem  of  maximising  one  func¬ 
tion  of  n  variables  to  that  of  maximising  n  functions  of  one  variable.  In 


a 


.i. 


the  execution  of  the  dynamic  programming  technique,  the  following  se¬ 
quence  aleo  in  constructed: 

Xj(x),  x^fx),  >  •  •  i  xn(x)  ,  (1*7) 

where  x^(x)  is  the  allocation  to  that  maximized  f^fx). 

The  heart  of  the  dynamic  programming  technique,  then,  is  the  construc¬ 
tion  of  the  sequences  (1-3)  and  (1-7).  As  mentioned  above,  computa¬ 
tional  considerations  require  the  discretizing  of  the  range  jO,  x],  say  into 
the  partition 

0  *  'o'  V  h  <  •  •  •  '  *n  *  x  <M> 

where  t.  =  Ai  for  some  fixed  A.  A  partition  such  as  (1-8)  often  is  de¬ 
noted  compactly  by  "a(A)b,  "  which  is  read  "from  a  to  b  in  steps  of  A.  " 

The  calculation  of  the  sequences  (1-3)  and  (1-7)  over  the  partition 
(1-8)  requires  that  jg^x)}  be  calculated  over  the  partition.  For  illustra¬ 
tion  of  the  dynamic  programming  technique,  consider  the  maximizing  of 
the  following: 

1 

Z  Z 

R6(xr  x2,  Xy  x4,  x5,  x6)  =  Xj  +  x2  +  x3  +2  sin  x4  ♦ 

g5{x5)  *  g6*V  1  <I"9) 

where 

2x5  if  0  ^  x5  i  1 
4  -  2x3  if  1  e  x3  ^  2  , 

and 

g6(v  •  z\:  • 


subject  to  the  constraint 
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(MO) 


with[V]  denoting  the  greatest  integer  in  x. 

This  is  to  be  done  for  each  x  =  0(0.  1)2;  that  is,  for  x  from  0  to  2  in  steps 
of  C.  1.  A  flow  chart  for  a  sequential  dynamic  programming  solution  of 
this  problem  is  given  in  Figure  M.  For  any  solution  model,  it  will  be 
necessary  to  evaluate  the  set  |g^(x.)|  at  each  point  of  some  partition.  For 
the  present  problem,  the  partition  0(0.  1)2  is  chosen,  and  the  functional 
values  are  recorded  in  Table  1-1.  From  this  table,  the  sequences  fj(x), 
f^x),  .  .  .  ,  £^(x)  and  x^(x),  x^x),  ,  ,  .  ,  x^(x)  can  be  determined  anti 
recorded  as  in  Table  1-2.  Recalling  that  x^(x)  is  not  necessarily  unique, 
note  that  Table  1-1  contains  the  information  necessary  to  determine  all 
optimal  policies  as  indicated  in  Table  1-2. 

Now  consider  a  method  for  reading  out  an  optimal  policy  for  a  given  re¬ 
source  from  Table  1-2.  The  method  is  simply  this:  Given  a  resource  x, 
(0  =  x  =  2),  select  x^(x),  with  x^  the  allocation  for  g^(x).  Now  select 
x^fx  -  x^),  which  is  just  x^,  tne  allocation  for  g^(x).  Next  select  x^(x  - 
-  Xf],  which  is  just  x^  for  g^(x),  and  so  forth  until  the  allocations  Xj, 
Xj,.  .  .  .  ,  are  determined  for  the  activities  g^(x^),  .... 

g^(x^),  A  flow  chart  for  the  readout  method  is  given  in  Figure  1-2;  this 
chart  ignores  multiple  solution,  but  all  solutions  are  indicated  in  Table 

■fa  fa, 


As  an  example  of  the  readout  process,  suppose  that  x  =  2.  0.  Then  from 
Tabic  1-2  it  can  be  seen  that  the  following  allocations  for  (x^.  x  x^, 
x^,  Xj. ,  xfc)  yield  ffe(2)  -  4.12  return:  (C,  0,  0.1,  0,  0.9,  1.0),  (0,  0, 

0.  i.  0.1.  0.8,  1.0).  (0.  0.  0.1.  0.2.  0.7,  1.0).  (0.  0,  0.  i,  0.3.  0.6. 

I,  0).  This  multiplicity  of  optimal  policies  is  a  result  of  the  nonunique- 
rvess  of  Jx.(x)}. 
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The  solution  model  for  the  dynamic  programming  problem  considered 
here  specified  the  calculation  of  each  of  the  functions  f^(x),  f^x),  .  ,  .  , 
f^(x)  over  the  range  0(0.  1)2,  Hence,  if  only  the  first  k  activities,  k  -  6, 
are  to  be  considered,  the  optimal  policy  for  a  given  x  can  be  readout  out 
easily  from  Table  1-2,  For  example,  let  x  =  1.5  and  k  =  4.  Now 
from  Table  1-2,  f.(l,  5)  =  2.43,  which  is  achieved  by  the  following 

allocations  for  (x ^ ,  x^,  Xy  x^):  (0.3,  0,  0.2,  1.0),  (0.2,  0,  0.3,  1.0), 
(0.2,  0,  0.2,  1.1),  and  (0.1,  0,  0.3,  1.1). 

The  solution  model  described  above  and  outlined  in  Figure  1-1  for  the 
solution  of  an  optimization  problem  by  dynamic  programming  is  sequen¬ 
tial  in  nature.  It  involved  computing  first  g^(t)  for  t  =  0(0.  1)2,  then 


TABLE  1-1  -  ACTIVITY  RETURNS  FOR  EQUATION  1-9 
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g2 (t)*  for  t  =■  0(0.  1)2,  and  so  forth  until  g^(t)  was  computed.  These  re¬ 
sults  make  up  Table  1-1.  From  this  table,  the  sequences  fj(x),  f2(x), 

.  .  .,  f6(x)  and  Xj(x),  «2(x)(  •  •  •»  *6(x)  were  computed  sequentially  for 

x  =  0(0.  1)2 

Finally,  a  readout  process  was  specified  for  determining  optimal  alloca¬ 
tions  for  a  given  resource.  The  whole  solution  model  was  sequential,  but 
the  method  of  dynamic  programming  itself  is  not  essentially  sequential  in 
nature. 

Now  it  will  be  indicated  how  parallel  aspects  of  the  dynamic  programming 
method  may  be  exploited  in  a  solution  model  similar  to  the  one  described 
above. 


TABLE  1-2  -  SEQUENTIAL  MAXIMIZATION  OF  EQUATION  1-9 
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Figure  1*2  -  Sequential  Optimal  Allocation  Readout 
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2.  PARALLEL  SOLUTION  MODEL 

Consider  the  problem  of  maximizing,  by  dynamic  programming  tech¬ 
niques,  the  function 

2N 


R2N(xr  x2*  *  *  • '  X2N^  ” 


(I-ll) 


where  2N  variables  are  assumed  for  convenience.  Let 


X1  +  x2  +  *  '  ’  XN  =  y! 


XN  +  1  +  XN  +  2  +  *  *  -  X2N  =  y2 


(M2) 

(M3) 


with 


y!  +  y2  =  X  ; 


uN(yi)  = 


(x 


r  2’ 


g2(x2)  +  . 


max 

*  •  *N>  | 

•  +  «N<XN>]  • 


(1-14) 


VN<*2>  = 


max 


I  *XN  +  V  XN  +  2’  *  *  ‘  X2N) 


i  =  N+  1 


[gN  +  1(XN  +  l*  +  gN  +  2(XN  +  2*  +  1  *  ‘  g2N(x2N}]  (I*I5) 

Now  Uj^(yj)  and  V^(y2)  may  be  computed  independently  and  thus  in  parallel, 
using  Equation  1-6.  This  equation  then  can  be  used  to  maximize  the  sum 
UN(yi)  +  VN(y2)  over 

j(y,.  y,)|y,  *y2  •  »|- 


.9. 
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Since  computer  time  is  approximately  proportional  to  the  number  of  vari¬ 
ables,  the  sequential  solution  time  for  the  maximisation  of  Equation  I- 11 

is  proportional  to  2N;  the  corresponding  parallel  solution  time  is  pro- 

N  M  .  1 

portional  to  N  +  2.  If  2  variables  were  involved,  the  2  pairs  first 

N  -  2 

could  be  processed  in  parallel,  then  the  resulting  2  pairs,  and  so  on. 

N 

The  sequential  solution  time  for  the  case  of  2  variables  would  be  pro- 
N 

portional  to  2  ;  the  parallel  solution  time  would  be  proportional  to  2N, 

In  addition  to  parallel  aspects  of  the  maximization  of  (I- 11),  parallelism 
exists  at  the  lowest-level  computations  (fundamental  and  subroutine  type 
computations).  For  example,  initially  the  2N  vectors  ^(0),  g^A), 
gj.(2A),  .  .  .  g^(x)J  i  =  1,  2,  .  .  .,  2N  can  be  computed  in  parallel. 

As  a  specific  example  of  the  injection  of  parallelism  into  a  dynamic  pro¬ 
gramming  solution  model,  consider  now  the  example  problem  introduced 
above;  that  is,  the  maximisation  of  (1-9)  under  the  constraint  (I- 10). 

The  several  activity  functions  of  Equation  1-9  are  independent  of  one 
another  and  hence  can  be  calculated  in  parallel  on  a  parallel  processor. 
However,  efficient  use  of  a  parallel  processor  prohibits  parallel  computa¬ 
tions  that  contribute  little  or  nothing  to  improved  solution  speeds,  and  hence 
tax  machine  capacity  unnecessarily.  Since  the  calculation  of  the  sequences 
(J.-3)  and  (1-7)  is  the  ultimate  goal  of  the  dynamic  programming  tech¬ 
nique,  values  for  the  activity  functions  jg^(t)}  need  not  be  calculated  prior 
to  the  time  when  che  values  are  needed  in  the  computation  of  (1-3)  and 
d-7). 


Consider  a  partition  Xq(A)x  to  be  used  in  the  dynamic  programming  maxi¬ 
mization  of  a  function  of  type  (I- 1 ). 

In  general, 


max 


fk(xQ  ♦  jA)  * 

*  u  0  S  i  S  j 


«k'x  *  +  *k 


.  !  (*  ♦  (j  *  Oa) 


(1-16) 


Hence,  to  calculate  ^(Xq  ♦  )A)  only  the  following  values  must  be  known: 
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*k  -  l^x0^  *k  -  l^x0  +  A^'  ‘  ’  *k  -  1<X0  +  -A) 

and 

gk(x0)l  gk(x0  +  A) . «k(x0  +  jA) 


(I-H) 


and  the  calculation  of  the  sequences  (1-3)  and  (1-7)  can  in  fact  be 
carried  on  in  parallel  with  the  calculation  j{  the  activity  functions 

Now  for  the  parallel  execution  of  the  dynamic  programming  maximization 
of  (1-9),  make  the  following  definitions: 


max 


U1(X)  0  =  y  ^  x[g2(y)  +  gl(x  '  y)]  * 

y j (x)  =  y  at  which  the  maximum  occurs; 


max  . 

U2(X)  =  0  ^  y  ^  x|*4(y)  +  g3(x  '  y)j  * 
y^(x)  =  y  at  which  the  maximum  occurs; 


max 


U3(X)  =  0  *  y  ^  x[g6(y)+g5(x-y)]* 
y (x )  -  y  at  which  the  maximum  occurs; 


max 


u4(x)  =  0  ^  y  ^  x|U2(y)  +  U1(X  ’  y)] 1 
y ^(x )  =  y  at  which  the  maximum  occurs; 


max  . 

l5(x)  =  0  h  y  *  x|u4(y)  +  u3<x  *  y)j  - 
v.  (x)  "  y  at  which  the  maximum  occurs  . 

'  3 


(1-18) 


U-19) 


(1-20) 


(1-21) 


(1-22) 


Consider  the  partition  0(0.  1)2  in  terms  of  xn  =  0  and  A  =  0.1,  and  then 


x.  =  Xq  *  iA.  i  =  0,  1,2 . 20.  A  chart  can  be  constructed  showing 

the  level-by-level  parallel  execution  of  the  dynamic  prcgrammmg 
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maximization  of  (1-9),  as  in  Table  1-3.  It  will  be  noted  that  in 
Table  1-3  a  level  corresponds  to  a  new  stage  of  computation  for  the 
activity  functions  jg^(t)}  and  return  functions  |u^(t)|  over  the  partition 
t  s  xq(A)x  =  0(0.  1)2.  Computation  of  the  sequence  y^(t),  i  =  1,  2,  ,  .  .  , 
5  haB  not  been  indicated,  but  the  required  values  are  an  immediate  con¬ 
sequence  of  the  calculation  of  the  sequence  u.(t),  i  =  1,  2,  ....  5. 

As  shown  in  Table  1-3,  the  partition  x^(A)x  and  activity  functions 
gj(x)-  i  =  1,  2,  ....  6  over  the  partition  can  be  calculated  in  21  levels 
through  parallel  computation.  The  same  computations,  performed  in  a 
sequential  manner,  would  require  141  levels.  A  further  indication  of  the 
parallel  characteristics  of  the  dynamic  programming  technique  and  the 
power  of  parallel  processing  is  seen  in  that  only  three  additional  levels 
of  computation  allow  the  complete  maximization  of  (1-9)  to  be  effected. 
Sequential  techniques  would  require  110  additional  levels  of  computation. 
Hence  parallel  techniques  offer  a  total  advantage  of  24  to  251  for  the 
problem  at  hand.  The  difference  in  computational  levels  required  by 
parallel  and  sequential  models  indicated  here,  striking  as  it  is,  only 
begins  to  point  out  the  increased  computational  speed  offered  by  parallel 
execution  of  the  dynamic  programming  technique,  since  no  appeal  has 
been  made  to  parallel  execution  of  basic  machine  instructions  effecting 
the  individual  computational  levels. 

The  results  of  the  parallel  dynamic  programming  compulation  for  Equa¬ 
tion  1-9  are  given  in  Table  1-4.  A  generalized  readout  process  is 
given  in  Figure  1-3.  A  specific  readout  for  x  =  2.  0  is  given  in  Fig¬ 
ure  1-4. 

In  Table  1-4,  the  maximum  possible  return  of  u^(2)  -  4.  12  is  achieved 
for  the  following  allocations  to  (x^,  x.,,  x^.  x^.  x^,  x^)  =  (0,  0.  0.  I, 

0.3,  0.6.  1.0).  (0,  0.  0.1.  0.2,  0.7,  1.0).  (0.  0.  0.1,  0.1,  0.8.  1.0), 
and  (0,  0,  0.  I,  0,  0.  9,  1.  0)  These  allocations  agree,  of  course,  with 
those  from  the  sequential  computation. 
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TABLE  1-3  -  PARALLEL  SOLUTION  MODEL 


Level 

Pxrtition 

« 

(*)* 

«2(*l 

i3(») 

g4(*) 

g5(*) 

g6(*> 

U 
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u 
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u 
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*4(X10> 

gg(x9» 

*6>V 

u 

u2(*9) 

u3(*8> 

u4(x8> 

u5(*7) 

1 1 

X1 2 

- 

*H  *  A 

g 

(*lj) 

*2(x11! 

*3(xll» 

*4<X11> 

*5{x10) 

*bfxl0> 

u 

<X10* 

U2tX10> 

u3(x9) 

U4,x9> 

u5(x8) 

12 

X1  3 

= 

x12  4  A 

8 

(*12) 

*2**12* 

*3<x12> 

g4(*l2) 

«S(xdl> 

«6<xll» 

u 

(*,,) 

u2(*,j) 

u3'x10! 

u4<x10> 

Ujtx,) 

13 

x14 

= 

x13  4  A 

g 

<x13> 

*2<x13) 

gj(*l3) 

g4!xl3) 

«»<x12> 

«b(x12> 

u 

(*12) 

U2<X12> 

u3(*u) 

U4<X1I1 

u5(xl0J 

14 

XiS 

- 

*14  *  A 

g 

(*14) 

*2(x14> 

«3(x14» 

*4<x14> 

gg(*13) 

*b(x13> 

u 

lxlj) 

u2<x!3» 

u3tx12* 

u4(x12» 

u5(*n) 

15 

*16 

= 

x15  4  A 

8 

(x15> 

g2<*15> 

*3(x15) 

g4(*,j) 

«4<x14> 

«b<"l4> 

u 

(*14) 

•»2<*u) 

u3<  13> 

u4<x13> 

US<X12> 

16 

X17 

s 

X16  4  A 

g 

<xlb> 

«2<xlb> 

«3(xlb> 

*4<xlb> 

g4t*15) 

«6ix15> 

u 

<X1S> 

u2(*.5) 

uj(*19) 

U4(X14> 

u5<x13> 

17 

x18 

= 

*17  4  A 

g 

(x17* 

«2<x17’ 

«3<x17» 

g4(*l7) 

*5(xlb> 

VXlb> 

u 

,xlb> 

“Z^lb* 

u3(*15) 

U4«X15> 

u5<x14> 

18 

*19 

= 

x18  4  A 

g 

,x18’ 

*2<x18» 

*3<x18) 

*4,x18> 

*5<x17* 

*6<Xt7> 

u 

(*17» 

u2(*n) 

u3<xlb» 

U4<X18> 

u5(*,5) 

19 

X20 

X1S  4  A 

g 

(*i9! 

*2(*19) 

gj(*l9) 

*4(X19> 

gg(x,g) 

*6<xl»i 

v 

(x18> 

u2(xie> 

u3<xl?> 

U4(X17> 

u5(xJ6i 

20 

( 

,x20) 

«2(x20> 

«4*x20> 

g5(*l9» 

«b(x!9> 

u 

(*19) 

u2(*t9) 

Vu* 

“S(x,7) 

21 

.  .  . 

*5<X20> 

*b,x20* 

u 

(*2J) 

U2(X20» 

u3(x,9l 

u4(*l9) 

USU1I> 

22 

u3<x20» 

u4(x20) 

u5<x19> 

23 

u5(x20> 

.. 

|^(x)  (with  i  =  1,  2 . 6)  defined  *•  on  Page  ^ 


(with  i  =  1.2 . 5)  defined  ae  on  Page  H 


For  a  resource  of  x  =  1.6,  the  maximum  return  of  Ug(l.  6)  =  2.  57  oc¬ 
curs  for  the  allocations  (0,  1.  5,  0.  1,  0)  if  only  the  first  four  activities 
are  considered. 

3.  CONCLUSIONS 

Dynamic  programming  has  been  introduced  and  illustrated  by  a  specific 
e.cample.  The  dynamic  programming  technique  was  examined  for  se- 
quential  and  parallel  characteristics.  Parallel  characteristics  were 
noted  and  tound  to  provide  a  basis  for  significant  increases  in  processing 
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TABLE  1-4  -  PARALLEL  MAXIMIZATION  OF  EQUATION  1-9 


time.  The  results  indicate  that  construction  of  efficient  solution  models 
for  parallel  processors  depends  heavily  on  analysis  of  the  problem  and 
machine  at  hand  so  that  machine  capacity  is  not  unnecessarily  taxed  in 
parallel  computations  that  improve  solution  speeds  very  little  if  at  all. 


% 
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THE  NOTATION  (q,  f.l  IS  NEANT  TWA  T  THE  RESOURCE  IN  THE  AMOUNT  q  IS  ALLOCATED  TO  ACTIVITY  I. 


APPENDIX  II  -  PROGRAMMING  OF  THE  DYNAMIC  PROGRAMMING 


TECHNIQUE  FOR  THE  IBM  7090  (SEQUENTIAL) 


I.  INTRODUCTION 

A  dynamic  programming  problem  was  programmed  for  a  standard  gen¬ 
eral-purpose  computer  and  for  Machine  I  to  compare  the  operation  oi  the 
two  computers.  The  ISM  7090  was  chosen  as  the  standard. 

In  programming  the  problem,  no  input-output  operations  are  performed. 
The  program  is  assumed  to  be  available  in  storage.  The  results  are 
stored  in  tables  according  to  the  table  layout  diagram  (see  page  42).  To 
obtain  a  recommended  resource  assignment  for  a  given  number  of  activi¬ 
ties,  th?  input  data  (N),  the  number  of  activities  to  be  considered,  and 
the  quantity  of  resource  to  be  assigned  (Xq),  are  assumed  to  be  in  stor  ¬ 
age  prior  to  starting  the  lookup  routine.  After  the  lookup  routine  is  exe¬ 
cuted,  the  recommended  resource  assignment  to  each  activity  is  found  in 
the  e  output  table. 

Minimum  and  maximum  program  execution  times  are  listed.  These  were 
determined  from  -he  quoted  minimum  and  maximum  instruction  execution 
times  in  the  IBM  70Vv-  Programmers  Reference  Manual. 

The  activity  functi  program  controls  the  execution  of  the  individual  ac¬ 
tivity  functions.  In  this  problem,  21  returns  from  each  of  the  following 
six  activity  functions  are  calculated: 

*  *  - 

gg{*>  *  x2  . 
g3(x)  *  Vx  , 
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g4(x)  =  2  sin  x  , 

g^(x)  =  2x  if  0  =  x  =  1  , 

=  4  -  2x  if  1  =■  x  =  2  , 
g6(x)  =  2[x]  . 

The  returns  for  each  function  are  stored  in  a  table.  When  the  returns  for 
all  activity  functions  have  been  obtained,  the  maximization  routine  is  exe¬ 
cuted.  The  maximization  routine  examines  the  return  table  for  each  ac¬ 
tivity  function  and  a  table  of  maximum  returns  for  all  previous  activity 
functions.  Then,  given  some  resource,  the  returns  from  all  resource 
combinations  as  applied  to  the  current  activity  and  to  all  previous  activi¬ 
ties  are  calculated.  The  greatest  return  is  obtained  and  stored.  The 
quantity  of  resource  that  generated  this  maximum  return  also  is  stored. 
When  all  quantities  of  resource  have  been  tested  against  all  activities, 
the  result  is  a  series  of  best  policy  tables. 

The  best  policy  tables  contain  for  a  given  quantity  of  resource  the  portion 
of  that  resource  that  should  be  assigned  to  the  respective  activity.  The 
remainder  of  the  resource  is  then  to  be  allotted  to  the  remaining  activities 
in  the  same  manner. 

The  lookup  routine  has  as  an  input  the  quantity  of  resource  to  be  assigned 
to  the  given  activities.  The  best  policy  table  for  the  higher-order  activity 
is  examined.  The  entry  in  the  table  corresponding  to  the  resource  to  be 
assigned  is  examined  and  a  recommended  assignment  obtained.  The  re¬ 
maining  resource  is  then  apphed  to  the  lower-order  activities  in  the  same 
mar  er.  The  result  is  a  recommended  assignment  of  resource  to  the  ac- 
ti\  s  that  will  generate  a  maximum  return. 

2.  ACTIVITY  FUNCTION  RETURNS 
a.  General 

The  following  are  initialized;  x  .  N,  and  A.  with  x  the  maximum 
**  m  dx  rndx 
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respurce  that  can  be  allocated  to  any  activity,  N  the  number  of  activity 
functions,  in  this  case  six,  and  A  the  resource  increment,  in  this  case 
0.  1. 

The  activity  counter,  k,  is  set  to  1  and  the  returns  for  resource  alloca¬ 
tions  varying  from  0  to  2  in  steps  of  0.  1  are  calculated.  When  the  as¬ 
signed  resource,  x,  reaches  the  maximum  resource  available,  x  , 
the  activity  counter  is  increased  by  1  and  returns  for  the  second  activity 
are  calculated. 

When  the  activity  counter  has  reached  N  =  6,  the  returns  from  all  activi¬ 
ties  have  been  computed  and  stored  in  tables  g^(x). 

lj.  Activity  Function  1 

The  return  from  activity  function  1,  gj(x)  =  x,  for  a  resource  assignment 
of  x  is  simply  x. 

£.  Activity  Function  2 

2 

The  return  from  activity  function  2,  g^x)  -  x  ,  for  a  resource  assign¬ 
ment  of  x  is  the  square  of  the  assigned  resource,  x^. 

d.  Activity  Function  3 

The  return  from  activity  function  3,  g^(x)-  vx,  for  a  resource  assignment 
of  x  is  the  square  root  of  the  assigned  resource,  Vx.  The  resource  is 
represented  in  the  computer  as  a  floating  point  number.  An  initial  guess 
for  the  square  root  is  made, 

C*k  =  j(f  ■*  1 )  * 

where  f  is  the  fractional  pa-t  of  the  floating  point  number.  This  guess  is 
the  input  to  the  iterative  portion  of  the  routine. 

After  three  iterations  where  successive  approximations  are  obtained  from 

G,  .  =  +  G,  , 

K  ’  1  m  Oi  K 

k 
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the  exponent  of  the  floating  point  number  is  tested  to  determine  the  expo¬ 
nent  of  the  square  root.  Program  control  is  then  returned  to  the  activity 
function  program. 

£.  Activity  Function  4 

The  return  from  activity  function  4,  g^(x),  for  a  resource  assignment  of 
x  is  2  sine  x.  The  input  to  the  series  is  R,  the  residue  of  x  mod  Zv.  The 
sign  of  the  sine  is  determined  and  quadrant  correction  of  R  is  performed. 
Then 


2 

is  calculated  and  used  as  input  to  the  series;  a  and  a  are  calculated  and 
stored.  The  nested  series  approximation, 

a(Cj  +  aZ(C3  +  a2(C5  +  a2c?)))  , 

is  computed,  the  sign  added,  and  the  result  stored.  Program  control  is 
transferred  back  to  the  activity  function  program. 

f.  Activity  Function  5 

The  return  from  activity  function  5,  g^(x),  for  a  resource  assignment  of 
x  is  2x  for  0  -  x  -  1  and  4  -  2x  for  1  ^  x  -  2. 

£■•  Activity  Function  6 

The  return  from  activity  function  6,  g^{x)  =  2[x],  for  a  resource  assign¬ 
ment  of  x  is  twice  the  largest  integer  equal  to  or  less  than  x. 

3.  MAXIMIZATION 

The  activity  counter  k  is  initially  set  to  1.  The  f^  j(x)  table  is  initially 
zeroed  since  the  maximized  returns  from  activity  0  are  zero.  The  re¬ 
source  to  be  maximized,  x,  is  initially  assigned  at  0.  A  storage  location, 
0,  containing  intermediate  maximum  returns, is  initially  set  at  a  maxi¬ 
mum  negative  value  to  enable  acceptance  of  the  first  return.  is  the 
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resource  to  be  assigned,  under  the  problem  constraints,  to  activity  k  and 
to  prior  activities  whose  maximized  returns  for  an  assigned  resource  are 
the  entries  of  the  f,  .(x)  table. 

K  *  1 

Index  i  =  x^/A  is  used  to  determire  the  return  from  an  assignment  of  re¬ 
source  x^  to  activity  k.  Index  j  =  x  -  i/A  is  used  to  determine  the  return 
from  an  assignment  of  resource  x  -  x^  to  prior  activities. 

The  sum  of  the  returns,  a,  is  stored  and  compared  with  any  previous  re¬ 
turn  for  the  given  resource,  /3.  If  the  current  return  is  larger  than  the 
previous  maximum  return,  /3  is  replaced  by  a  and  the  allocated  resource 
that  generated  this  return  is  stored  in  y.  If  the  current  return  is  equal 
to  or  lego  than  the  previous  maximum  return,  /3  and  y  remain  unchanged. 

Then  x^  is  incremented  by  A  and  if  it  is  less  than  or  equal  to  the  current 
maximum  resource  to  be  tested,  a  new  set  of  indices,  i  and  j,  are  cal¬ 
culated  and  the  above  process  repeated. 


When  is  greater  than  x,  indicating  that  all  combinations  of  resource 
x^  subject  to  the  constraints  have  been  used  to  determine  the  maximum 
return,  then  the  value  of  the  maximum  return  and  the  amount  of  resource 
that  generates  this  return  are  stored  in  the  fjjx)  return  table  and  XjJx) 
policy  table,  respectiv  jly. 


Then  x  is  increased  by  A  and  tested  against  xmax*  If  x  i*  equal  to  or  less 

than  x  ,  all  combinations  of  x  resource  allocations  consistent  with  the 
max 

problem  constraints  are  tested.  If  x  is  greater  than  xmax*  then  the 
table  is  moved  to  the  f^  _  ^(x)  table,  and  k  is  increased  by  1  and  tested. 

If  k  is  equal  to  or  less  than  N,  then  the  maximizing  process  is  repeated. 

If  k  is  greater  than  N.  the  procedure  stops. 

At  this  time,  return  tables  and  policy  tables  are  available  in  storage. 


4.  LOOKUP 

In  the  lookup  routine,  k  is  initialised  with  N,  the  number  of  activities. 

th 

The  resource  to  be  allocated  is  stored  in  x.  The  x  entry  of  the  policy 
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table  for  the  k^  activity  is  obtained  and  stored  in  table  9.  X  is  decreased 
by  the  amount  that  is  assigned  to  activity  k.  The  diminished  x  is  then  ap¬ 
plied  to  the  k  -  1  activity  and  a  recommended  policy  obtained.  This  proc¬ 
ess  is  repeated  for  the  remaining  activities.  The  end  result  is  a  table  of 
recommended  resource  assignments  to  the  k  activities. 

5.  CONCLUSIONS 

The  sequential  execution  of  the  sample  dynamic  programming  problem 
takes  0.  151  to  0.  224  sec  when  programmed  for  an  IBM  7090  (see  Appen¬ 
dix  III  for  Machine  I  parallel  execution).  Approximately  570  words  of 
memory  are  used  for  program  location  and  table  storage. 

6.  FLOW  CHARTS  AND  PROGRAM  TABLES 

Figures  II-1'through  11-5  are  the  flow  charts  for  the  activity  functions. 
Figure  H-6  is  the  flow  chart  for  maximation  and  Figure  7  is  the  lookup 
function  flow  chart. 

Table  II- 1  shows  the  execution  time  for  the  dynamic  programming  tech¬ 
nique  on  the  IBM  7090.  Tables  U-2  through  II- 5  show,  respectively,  the 
programs  for  activity  function  control,  maximization,  the  activity  func¬ 
tions,  and  lookup.  Table  II- 6  shows  the  common  storage. 
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Figure  U-2  -  Activity  Functione  1  and  2  Flow  Chart,  IBM  7090 
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Figure  II -6  -  Maximisation  Function  Flew  Chart,  IBM  7090  (Sheet  1) 
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Figura  II -6  *  Maximisation  Function  Flow  Chart,  IBM  7090  (Shaat  3) 
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TABLE  Ii-1  ..  IBM  7090  EXECUTION  TIME  FOR 
DYNAMIC  PROGRAMMING  PROBLEM 


Machine  cycles 

Item 

Mini  mum 

Maximum 

Activity  function 

1,056 

1,  056 

gj(*) 

127 

127 

*2(*) 

169 

400 

g3<*) 

1,723 

3,  130 

*4(x) 

1,369 

1, 831 

g5<x> 

313 

544 

g6«*) 

154 

212 

Maximization 

64,  398 

95,430 

Lookup 

176 

176 

Total  machine  cycles 

69,485 

102, 906 

Microseconds  per 
machine  cycle 

2.18 

2.  18 

Total  microseconds 

151,477 

224,  335 

Total  seconds 

0.  151 

0.  224 

TABLE  II -l  -  ACTIVITY  FUNCTION  CONTROL 
PROGR A M,  IBM  7 090 


Item 

Instruction 

Remarks 

Machine 

cycles 

INIT 

LXA 

2 

♦SIX 

<T3)  = 

INIT6 

LXA 

1 

TWONE 

(T  1 )  =  21d 

2 

INIT  2 

CLA 

r  4 

<T4)  =  c 

2 

INIT5 

STO 

X 

fx)  =  C  or  x  *  A 

■» 

IB  A 

l 

A  DDR 

! 

I 

INIT  ( 

|  C  LA 

X 

x  x  *  A 

2 

j  ADD 

a, 

T  J 

2 

j  C  AS 

X 

J 

mu 

I  R  A 

INIT  4 

> 

1 

IK  A 

!NiT5 

« 

1 

t  BA 

INIT  5 

1 

:ni  r< 

MX 

INI  I  t- 

12  12-1 

2 
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TABLE  II-3  -  MAXIMIZATION  FUNCTION  PROGRAM,  IBM  7090 


Item 

Instruction 

Remarks 

Machine 

cycles 

CLA 

ONE 

2 

STO 

k 

=  1  activity  number 

2 

LXA 

2 

l'WONE 

(12)  =  r  21 

2 

RET  1 

STZ 

2 

fk  _  j(x)  TAB 

0  *  j(x)  table 

2 

TIX 

2,  1 

RET  1 

2 

RF.T9 

STZ 

X 

0 — »x  resource 

2 

RET2 

CLA 

-  00 

77  77  77  77  77  77 

2 

STO 

£ 

2 

STZ 

xk 

2 

RETS 

CLA 

xk 

2 

DVH 

A 

X 

ii 

>— • 

3  -  11 

STQ 

IND1 

=  i 

2 

CLA 

X 

2 

DVH 

A 

x/  A 

3-14 

XC  A 

(AC)  =  x/A 

1 

SUB 

IND1 

2 

STO 

IND2 

=  j 

2 

LXA 

2 

IND1 

(ID  =  * 

2 

LXA 

4 

IND2 

(14)  =  j 

2 

CLA 

2 

gk  tab 

2 

ADD 

4 

<k  .  ,  tab 

(AC)  =  a 

2 

CAS 

a 

3 

TRA 

RET  3 

> 

1 

TRA 

RET  4 

= 

1 

IRA 

RET  4 

< 

1 

RET  3 

STO 

0 

03)  =  a 

2 

CLA 

Xk 

2 

STO 

1 

Y 

(Y)  =  »k 

2 

RET4 

CLA 

\ 

2 

APPENDIX  IX 


TABLE  II- 3  -  MAXIMIZATION  FUNCTION  PROGRAM.  IBM  7090  (Continued) 


Instruction 


Remarks 


Machine 

cycles 


ADD 

A 

STO 

Xk 

CAS 

X 

TRA 

RET6 

TRA 

RET  5 

TRA 

RET  5 

CLA 

0 

STO 

2 

fk(x)  TAB 

CLA 

Y 

STO 

2 

xk(x)  TAB 

CLA 

X 

ADD 

A 

STO 

X 

CAS 

X 

max 

TRA 

OUT 

TRA 

RET2 

TRA 

RET2 

"OUTPUT" 

LXA 

2 

+TWCNE 

CLA 

2 

fk(x)  TAB 

STA 

2 

£k  -  !<*>  T 

TIX 

2.  1 

RET7 

CLA 

k 

ADD 

ONE 

CAS 

TRA 

TRA 

STO 

TRA 


N 

RET9 
RET  8 
k 

RET9 


xk  *  xk  + 


Best  returns 


x  =  x  +  A 
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TABLE  11-4  -  ACTIVITY  FUNCTIONS  PROGRAM,  IBM  7C90  (Continued) 


Machine 

Item 

Instruction 

Remarks 

cycles 

CAS 

TWO 

3 

TRA 

A2 

> 

1 

TRA 

A1 

= 

1 

A1 

STO 

1 

G5TAB* 

< 

2 

TIX 

1,  1 

INIT3 

2 

TRA 

INIT3 

1 

A2 

STO 

TEM 

2 

CLA 

FOUR 

2 

SUB 

TEM 

2 

TRA 

A1 

1 

g6<*> 

CLA 

X 

2 

CAS 

ONE 

3 

TRA 

B  1 

> 

1 

TRA 

B3 

1 

STZ 

1 

G6TAB* 

< 

2 

B4 

TIX 

1 

INIT3 

2 

TRA 

INIT3 

1 

B1 

CAS 

TWO 

3 

HLT 

> 

TRA 

B2 

= 

1 

B: 

CLA 

TWO 

< 

2 

B5 

STO 

1 

G6TAB* 

2 

TRA 

B4 

1 

B2 

CLA 

FOUR 

2 

TRA 

B5 

l 
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TABLE  II- 5  -  LOOKUP  FUNCTION  PROGRAM,  IBM  7090 


Machine 

Item 

Instruction 

Remarks 

cycles 

CLA 

N 

2 

STO 

k 

Highest  activity  num¬ 
ber 

2 

CLA 

X 

2 

STO 

X 

2 

LOK3 

LXA 

2 

k 

2 

CLA 

2 

ADDRXk(x) 

2 

ST  A 

LOK1 

2 

LXA 

4 

X 

2 

LOK1 

CLA 

4 

x^(x)TAB 

2 

STO 

2 

OTAB 

Best  allocation  for  k 
activity 

2 

CLA 

k 

2 

SUB 

1 

3 

CAS 

ZERO 

1 

TRA 

LOK2 

> 

1 

KLT 

END 

= 

? 

HLT 

i  < 

2 

LOK2 

STO 

k 

|  1 
1 

2 

CLA 

X 

2 

SUB 

2 

0T  A  B 

2 

STO 

X 

i 

; 

:  TRA 

LOK  \ 
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TABLE  II-6  -  COMMON  STORAGE 


Item 

Data 

Remarks 

T4 

C 

T3 

A 

T2 

N 

T 1 

X 

max 

+6 

TRA 

g^x) 

+5 

TRA 

g2(x) 

+4 

TRA 

g3(x) 

+3 

TRA 

g4(x) 

+2 

TRA 

»5(x> 

+  1 

TRA 

*6,x) 

ADDR 

ST.X 

TWONE 

ONE 

k 

-  00 

p 

xk 

INDi 

1 

IND2 

1 

1 

ADDR  x^fx) 

ADDRESS 

x  j(x)TAB 

ADDRESS 

x2(x)TAB 

ADDRESS 

x3(x)TAB 

ADDRESS 

x4(x)TAB 

ADDRESS 

x5(x)TAB 

ADDRESS 

x6(x)TAB 

XO 

i 
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TABLE  II-6  -  COMMON  STORAGE  (Continued) 


Item 

Data 

Remarks 

Y 

ZERO 

COMMON 

COMMON  +  1 

COMMON  +  2 

COMMON  +  3 

C9 

CIO 

TEM 

TEM  +  1 

C7 

400.  002310715 

-0. 00467377 

C5 

000.  050632127 

0. 07968968 

C3 

400.  512567405 

-0.64596371 

Cl 

001.  444176646 

1. 57079630 

c/K 

000.  505746037 

0.6366198 

3 

R 

277 

006. 220773230 

6.2831853 

3rr/  2 

004. 554574363 

4.7123889 

77 

003. 110325514 

3. 1415927 

tt/2 

001.  444176646 

1.5707963 

SINK' 

-0 

a 

s2 

TWO 

FOUR 

| 
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APPENDIX  III  -  PROGRAMMING  OF  THE  DYNAMIC  PROGRAMMING 
TECHNIQUE  FOR  MACHINE  I  (PARALLE L) 


1.  INTRODUCTION 

Described  in  this  appendix  is  the  sample  problem  that  was  programmed 
for  Machine  I  using  the  dynamic  programming  technique.  The  objectives 
were  to  develop  parallel  solutions  and  programming  techniques  and  to 
determine  what  difficulties  might  arise  in  programming  fc  r  Machine  I. 
Hence,  the  sample  problem  was  kept  small  and  no  attempt  was  made  to 
extract  maximum  parallelism  and  speed. 

The  narratives  and  programs  for  the  Machine  I  sample -problem  program¬ 
ming  are  detailed  under  Item  2;  this  description  revolves  around  six  ac¬ 
tivity  functions,  a  maximization  function,  and  a  lookup  function.  The 
Machine  I  programming  results  are  presented  under  Item  3.  Under 
Item  4,  these  results  are  compared  with  those  resulting  from  the  pro¬ 
gramming  of  the  same  problem  for  a  sequential  computer. 

2.  NARRATIVES  AND  PROGRAMS 
a .  Activity  Function  1 

The  flow  chart  for  activity  function  1.  gj(x)  =  x,  is  shown  in  Figure  111-1; 
the  program,  in  Table  lli-l;  and  the  data  vector  format,  in  Table  III-2. 

This  function  *s  started  by  storing  an  ENB1  JMP  G1  instruction  in 
the  WAIT  LIST.  A  free  processor  takes  this  instruction,  enters  into 
index  register  I,  and  jumps  to  Gi.  which  is  the  address  of  the  first  in¬ 
struction  ot  the  g  | (x)  activity  function. 

Index  register  2  is  initialized  with  zero,  and  index  register  3  is  loaded 
with  n.  which  ,s  the  number  o!  tunes  gj(x)  is  to  be  calculated.  The 
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Figure  III- 


Activity  Function  1  Fic  art,  Machine  I 
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TABLE  Ill-i  -  ACTIVITY  FUNCTION  1  PROGRAM, 

MACHINE  I 
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TABLE  m-2  -  ACTIVITY  FUNCTIONS  l  AND  2  DATA  VECTOR 

FORM  ATS,  MACHINE  I 


Function 

gjfx) 

Function  g7(xi 

Address 

Data 

Address 

Data 

Address 

|  Data 

*1 

X 

«2 

X 

g2  +  24 

TEMP  x 

*1  +  1 

A 

«2%  1 

A 

g2  +  25 

TEMP  x  + 

1A 

*1  +  2 

n 

g2  +  2 

n 

g2  +  26 

TEMP  x  + 

2A 

*1  +3 

xo 

«2  +  3 

2 

x0 

82  +  27 

TEMP  x  + 

3A 

«1  +4 

X1 

«2  +  4 

2 

X1 

g2  +  28 

TEMP  x  + 

4A 

gl  +5 

x2 

g2  +  5 

2 

x2 

g2  +  29 

TEMP  x  + 

5A 

«!  ♦  8 

x  1 
3 

g2  +  6 

2 

x3 

g2+  30 

TEMP x  + 

6A 

«1*7 

x4 

«2  +  7 

2 

X4 

g2  +  31 

TEMP  x  + 

7A 

gj  +  8 

x5 

g2  f  8 

2 

x5 

g2  <•  32 

TEMP  x  + 

8A 

*1+9 

x6 

82  +  9 

2 

x6 

g,  +  33 

TEMP  x  + 

9  A 

gt  +  10 

x? 

82  +  10 

2 

x7 

g2  +  34 

TEMP  x  + 

1 0  A 

gt  ♦  >» 

x8 

g2  +  H 

2 

x8 

g2  +  35 

TEMP  x  + 

1 1 A 

gj  +  12 

X9 

g2  +  12 

2 

x9 

g2  +  36 

TEMP  x  + 

12A 

«!  +  I3 

x10 

g2  +  13 

2 

x10 

g2  +  37 

TEMP  x  + 

1 3  A 

gj  +  14 

xn 

g2  +  14 

2 

xn 

g2  +  38 

TEMP  x  + 

14A 

8,  +  15 

x12 

g2  +  15 

2 

x  1 2 

82*  39 

TEMP  x  r 

15A 

g,  +  16 

x  1 3 

g2  +  16 

2 

X13 

g2  +  40 

TEMP  x  + 

16A 

g!  +  I7 

x  1 4 

g2  +  I7 

2 

X14 

g2  +  41 

TEMP  x  + 

17A 

gj  ♦  *8 

x  1 5 

g2  +  18 

1  2 

X1  5 

g2*42 

TEMP  x  + 

18A 

g,  *  19 

x16 

g2  *  19 

2 

xln 

g2  *  43 

TEMP  x  + 

19A 

6,  +  20 

*  1 7 

S2  +  20 

2 

X  1  7 

g.  +  44 

TEMP  x  + 

20A 

«,  ♦  21 

x  1  8 

fi2  ♦  21 

2 

x  1 8 

«l  ♦  22 

x  19 

«2  +  “ 

X192 

g,  ♦  23 

*20 

«2  +  23 

x20 

-  .....  . 
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contents  of  the  first  location  of  the  gj  vector,  which  is  x,  is  then  stored 
in  the  third  word  of  the  vector.  Here  x  is  the  initial  value  of  the  vector 
from  which  all  subsequent  values  of  the  activity  function  are  calculated; 
it  is  increased  by  A,  then  shifted  into  Q,  and  a  series  of  instructions  is 
executed  to  enable  the  starting  of  processors  to  operate  on  the  maximi¬ 
zation  routine . 

The  maximization  routine  needs  the  beginning  address  of  the  vector  and 
the  value  of  the  resource  for  which  activity  function  returns  are  currently 
available.  For  every  return  calculated  for  the  g,(x)  activity  function,  a 

A 

processor  is  started.  The  information  transferred  to  this  processor  is 
Uj ,  o,  and  j. 

Inserted  into  the  address  field  of  the  instruction  at  G1M  +  1  s  the  loca¬ 
tion  where  the  contents  of  the  index  registers  are  stored.  The  instruc¬ 
tion  is  then  stored  in  the  WAIT  LIST  for  an  available  processor.  The  con¬ 
tents  of  Q  are  now  shifted  back  into  A.  Index  2  is  tested  to  determine  if 
all  iterations  have  been  completed.  Il  not  completed,  index  2  is  incre¬ 
mented,  and  the  loop  is  repeated.  When  index  2  equals  21,  the  operation 
is  halted. 

Successive  values  of  the  function  are  stored  in  successive  locations  of  the 
vector.  Each  location  has  a  unique  name  as  determined  by  B1  +  B2  +  3; 

B1  equals  gjl  132  is  incremented  by  1  for  eacn  iteration.  Successive 
names  of  elements  of  the  vector  are  g  +  0  f  3,  g j  +  1  -  3,  gj  +  2  + 

3  +  .  .  .  . 

b ,  Activity  Function  2 

The  flow  chart  for  the  g~>{''<)  -  activity  function  is  shown  in  Figure  III  -  2 ; 
the  program  in  Table  III-3,  and  the  data  vector  format,  in  Table  III -2 
along  with  the  gj(x)  format. 

This  function  is  started  in  the  same  manner  as  activity  function  gj(x). 

When  there  in  an  available  proce  ssor  and  the  JMP  G2  is  executed,  zero  is 
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TABLE  1II-3  -  ACTIVITY  FUNCTION  2  PROGRAM^ 


entered  into  index  2,  and  Bl  +  2  *  n  is  entered  into  index  3.  The  initial 
value  of  the  data  vector,  x,  is  stored  temporarily  in  an  address  equal  to 
Bl  B2  +  B3  +  3.  On  the  first  execution  of  the  ro  utine,  this  address  is 
equal  to  +  0  +  n  t  3.  In  subsequent  executions.  B2  is  incremented  up 
to  a  maximum  of  n  *  B3,  Hence,  for  each  execution  there  is  a  unique 

temporary  storage  address  where  x  ie  stored  prior  to  forming  x,^, 

2  1  1 
Then  is  stored  in  the  data  vector  at  the  address  equal  to  Bl  *  B2  ♦  3. 

Now  x.  is  erased,  and  this  location  becomes  an  available  word  of  memory 

capable  of  being  named  and  used  by  another  routine.  This  routine  is  re- 

peated  n  times  until  ail  valors  of  the  activity  function  have  been  calculated. 

Activity  functions  gj(x)  end  g^ix)  are  quite  simple.  Except  for  the  method 
of  starting  the  processors,  they  could  be  run  on  a  sequential  machine. 
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c.  Activity  Function  3 

The  flow  chart  for  the  g^(x)  =  ^x" activity  function  is  shown  in  Figure  1II-3; 
the  program,  in  Table  III-4;  and  the  data  vector  format  in  Table  III-5 . 

This  function  is  started  by  transferring  indices  g^,  2,  and  0  to  an  avail¬ 
able  processor  and  jumping  to  the  subprogram,  INIT.  Indices  g^,  2, 
and  0  are  in  the  /31  table,  and  the  LD1  /3 1 ,  JMP  I  instruction  is  stored  in 
the  WAIT  LIST  to  start.an  available  processor. 

The  term,  j3l,  designates  index  values  that  are  used  as  inputs  to  the  I 
subprogram;  and  /32  designate  index  values  used  as  inputs  to  the  LOOP 
and  Q  suDprograms.  Each  time  one  of  these  indices  is  stored,  an  instruc¬ 
tion  is  also  stored  in  the  WAIT  LIST.  This  instruction  has  the  pertinent  ]3 
and  subprogram  address  to  enable  the  processor  to  acquire  the  index 
values  and  to  jump  to  the  appropriate  subprogram.  Since  the  index  trans¬ 
fer  operation  is  complete  at  the  time  of  the  jump  to  the  subprogram,  and 
since  the  index  value  in  the  /3  table  is  no  longer  of  use,  this  information 
is  erased  from  the  /3  table. 

Three  more  processors  are  started  with  indices  g-j,  0,  and  0;  g^,  1,  0; 
g y  2,  0.  These  indices  are  stored  in  £2  and  are  used  to  start  the  sub¬ 
program  LOOP.  Each  time  a  processor  is  started  on  LOOP,  another 
one  is  started  on  subprogram  Q. 

The  gy  2,  and  0  indices  sent  to  the  i  program  were  the  beginning  of  a 
tree  of  indices  generated  to  permit  parallel  calculation  of  the  square  roots. 
With  an  index  of  i  =  2  as  input  to  the  I  program,  2i  and  2i  -  1  are  gener¬ 
ated  and  used  as  inputs  to  the  I  program  and  to  the  LOOP  program;  2i  and 
2 i  -  I  in  turn  generate  4i,  4i  -  1 ,  4i  -  2,  anJ  4i  -  3.  Eventually,  a  calcu¬ 
lated  index  exceeds  the  vector  sue  and  index  calculation  halts. 

The  LOOP  subprogram  calculate*  the  square  root  using  Newton's  itera¬ 
tion.  The  Q  program  determine#  whether  the  exponent  is  even  or  odd.  to 
determine  the  exponent  cl  the  square  root. 

In  addition  to  the  data  vector  generated  b\  the  program,  three  other 
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TABLE  III -4  -  ACTIVITY  FUNCTION  3  PROGRAM, 

MACHINE  I 


Item 

Instruction 

Remarks 

Time  (/nsec) 

LDI 

01 

30 

JMP 

INIT 

01 

INIT 

LDA 

LI 

Bl,  B2,  B3 

30 

BGN 

01 

gy  2,  0 

ENB2 

i 

30 

ENB4 

2 

1 

INIT  1 

STI 

02 

30 

LDA 

L02 

BGN 

02 

30 

INB2 

1 

BJP4 

INIT  1 

30 

HLT 

LDI 

01 

30 

JMP 

I 

01 

Bl,  B2,  B3 

gy  2,  C 

* 

A 

SEH 

01 

30 

LDI 

01 

INB2 

2  -1 

i  a  2i  -  1 

30 

ENA 

2  0 

COM 

1  2 

2i  •  1 : n 

30 

HLT 

5 

STI 

01 

5 

30 

LDA 

L01 

j 

BGN 

01 

01 

30 

STI 

02 

Bl.  32,  B3 
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TADLE  IU-4  -  ACTIVITY  FUNCTION  3  PROGRAM, 


MACHINE  I  (Continued) 


Item 

Instruction 

. —  . . — 1 

Remarks 

Time  (ptec) 

LDA 

L/32 

i 

g3>  21  -  1.  0  | 

30 

BGN 

02 

j 

ENA 

1 

i  -  2i 

COM  1 

2 

2i:n 

HLT 

> 

30 

STI 

01 

LDA 

L/31 

01 

30 

BGN 

01 

Bl,  B2,  B3 

STI 

02 

g3*  2i,  0 

30 

LDA 

L02 

BGN 

02 

30 

HLT 

L01 

LDI 

•  •  • 

JMP 

I 

L/32 

LDI 

.  .  • 

JMP 

LOOP 

LDI 

02 

30 

JMP 

LOOP 

02 

Bl,  B2,  B3 
g y  0-  0 

LOOP 

SEH 

02 

%y  1.  0 

30 

i  LDI 

02 

iy  2.  0 

STI 

02 

g3,  i.  0 

30 

LDA 

L/33 

i 

! 
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TABLE  III -4  -  ACTIVITY  FUNCTION  3  PROGRAM, 
MACHINE  I  (Continued) 


Item 

Instruction 

remarks 

Time  (/usee) 

BGN 

03 

30 

ENA 

2 

0 

FMP 

I 

1 

iA 

60 

FAD 

1 

i 

STA 

1 

2 

T2 

ni 

30 

AND 

f  MASK 

STA 

1 

2 

T3 

£ 

30 

ARS 

1 

ADD 

1/2 

30 

STA 

1 

2 

T1 

ENB4 

2 

30 

NOD 

LOOP  I 

LDA 

1 

2 

T3 

60 

FDV 

1 

2 

T1 

g 

SEH 

1 

2 

T1 

60 

FAD 

1 

2 

T1 

FDV 

2 

60 

STA 

1 

2 

T1 

BJP4 

LOOP  1 

i 

30 

HLT 

L3 

LDI 

•  t  • 

30 

JMP 

Q 

LDI 

03 

30 

JMP 

Q 

Q 

SEH 

03 

30 

LDI 

03 

LDA 

1 

2 

T2 

30 

AND 

QIMSK 

Exponent  Bit  1 
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TABLE  HI -4  -  ACTIVITY  FUNCTION  3  PROGRAM, 
MACHINE  I  (Continued) 


Item 

Instruction 

Remarks 

Time  (jusec) 

JNZA  QA 

Q1  t  0 

30 

LDA  1  2  T2 

Q1  /  0 

AND  QMSK 

Exponent 

30 

NOP 

QB 

ARS  1 

30 

ADD  1  2  Tl 

Fraction 

STA  1  2  2 

/n 

30 

HLT 

LDA  1  2  T2 

30 

AND  QMSK 

ADD  Q1 

30 

JMP  QB 

temporary  vectors.  Tl,  T2,  and  T3,  are  generated  to  hold  (1)  the  initial 
calculated  guess  of  the  square  root,  (2)  the  number  itself,  and  (3)  the  frac¬ 
tional  part  of  the  number. 

Temporary  storage  appears  to  be  sizeable.  The  temporary  addresses  are 
actually  reserved  addresses  not  necessarily  occupied.  Only  a  portion  of 
this  block  would  be  filled  at  any  one  time,  since  processors  operating  on 
the  program  kre  continually  started  and  stepped  as  data  are  entered,  used, 
and  erased.  The  addresses  when  occupied  are  not  available  for  use  by 
other  processors.  However,  when  the  data  is  erased  the  location  is  then 
free. 

In  a  conventional  memory,  temporary  storage  is  defined  ■i»  a  certain  block 
of  words  occupied  at  a  certain  time.  This  area  cannot  be  used  otherwise 
to  store  instructions  or  data. 
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In  the  Machine  I  memory  a  portion  of  each  word  is  used  to  denote  the 
name  or  address  of  the  word.  Any  word  can  have  any  name  that  is  repre- 
sentabie  in  the  24  hits  of  the  name  field.  In  addition,  any  unnamed  word 
is  available  for  use  by  any  program  at  any  time. 

It  is  possible  through  indiscriminate  naming  to  have  common  names  in  a 
number  of  programs,  which  situation  may  be  undesirable.  Hence,  where 
a  large  number  of  programs  are  running,  a  portion  of  the  name  field,  the 
prefix,  should  be  used  to  isolate  names  to  a  particular  program.  Such  a 
prefix  name  is  unique  to  a  particular  program.  The  original  name  plus 
the  prefix  constitute  a  unique  name  for  the  individual  program. 

During  the  study,  the  unique  prefix  names  were  carried  in  index  register  1. 
The  other  indices  and  address  fields  of  the  instructions  were  used  to  gen¬ 
erate  the  suffix  names. 

d.  Activity  Function  4 

The  flow  chart  for  the  g4(x)  =  2  sin  x  activity  function  is  shown  in  Fig¬ 
ure  III —4 ;  the  program,  in  Table  III -6 ;  and  the  data  vector  format,  in  Ta¬ 
ble  III- 7 . 

This  function  is  composed  of  six  subprogram"  HSIN,  SER,  SIN  (i  t  j)A, 

* 

COS  (i  +  j)A,  SIN  (x  -t-  jA)  and  COS  (x  +  jA).  HSIN  computes  a  and  a  for 
the  input,  x,  and  also  $  and  6  for  the  input,  A;  <*,  Q  ,  8-  and  5  are  inputs 
to  SER  tor  the  calculation  of  sm  x.  cos  x  sin  A,  and  cos  A. 

Sin  2A  and  cos  2A  are  calculated,  and  a  tree  is  started  to  generate  in¬ 
dices  that  permit  calculation  of  sin  iA.  cos  iA.  sm  (■  -  HA,  and  cos 
ji  -  for  i  >  2.  lor  each  incremental  angle  the  sin  x  *>  iA  and  cos  x  + 
iA  are  computed  and  stured  m  the  output  data  vector. 

To  start  the  SIN  routine .  index  g^.  0.  0  and  mdex  ‘  1,  4.  0  are  trans¬ 
ferred  to  two  available  processors.  These  units  execute  the  HSIN  subpro- 
gram,  one  working  with  x  and  the  other  working  with  A.  Alter  a.  o  , 
and  $  are  calculated,  tour  processors  are  started,  each  executing  the 
SER  »ubpt  The  outputs  are  sm  x.  cos  x.  am  A  and  cos  A.  Th« 
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\  START / 

0  0  \  / 

v  *• 80 

%  +  ’■  5  0  \/ 

*v  e.  ca 

Y 

B4.  #,  sA 

« -  N  J!  1 

f4.  12.  cA 

S.N,  r  Ct(|}^ 

ca(g)  *  cs(g)  + 

c,(s)  +  c*(s) 


SIN  (i  *  j)  A  - 
SIN  i  A  COS  i  A 
COS  i  A  SIN  n\ 


COSli  +  j)A  = 
COS  i  A  cos  jA 
SIN  «A  SIN  j  A 


>IN  *  *  j  A  >- 
SIN  »  cos  j  \  * 
COS  *  SIN  j  \ 


COS  «  *  j  \  - 
COS  X  COS  I  \ 
Sin  »  SIN  j  \ 


=  J  *  1 


Oi*  Ug  MA  k 


Figure  III  i  *  Activity  Function  4  Flow  Chart,  Machine  1 
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TABLE  II1-6 '.-ACTIVITY  FUNCTION  4  PROGRAM,  MACHINE  I 


Item 

I  In  »t  ruction 

Remark* 

Time  (jusec) 

LDI 

P 

P 

30 

JMP 

HSIN 

1 

Bl,  B2,  B3 
g4»  o.  0 

g4  l  r  5’  0 

HSIN 

SEH 

P 

30 

LDI 

P 

LDA 

1 

4 

x  or  A 

30 

NOP 

HS 

FS'J 

zt 

60 

NOP 

COM 

2« 

30 

JMP 

HS 

STA 

1 

2 

3 

R  or  r 

30 

NHA 

Y 

ARS 

I 

Y  A 

30 

STQ 

1 

2 

5 

*  sin  x,  *  sin  A 

LRS 

1 

30 

STQ 

I 

2 

8 

cos  x,  *  cos  A 

HS2 

SEH 

1 

2 

3 

30 

LDA 

1 

2 

3 

R  or  r 

FMP 

2/» 

60 

STA 

1 

2 

3 

a  or  £ 

LAC 

1 

2 

3 

30 

STA 

l 

2 

6 

•  •* 

0  or  5 

!  BJP2 

HLT 

30 

NOP 

HSl 

ENB4 

3 

30 

NOP 

0  output 
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TABLE  III -6  ■>-  ACTIVITY  FUNCTION  4  PROGRAM,  MACHINE  I  (Continued) 


Item 

" 

Instruction 

Remarks 

Time  (psec) 

SEH 

Cot 

30 

FAD 

Ca 

STA 

c a 

30 

LDA 

N  +  1 

BGN 

n 

30 

ENB2 

STI 

g 

30 

LDA 

P 

START 

BGN 

g 

SIN  (x  +  jA),  j  =  l 

30 

STI 

h 

COS  (x  +  jA),  j  =  I 

LDA 

P  +  1 

30 

BGN 

h 

HLT 

30 

NOP 

N 

LDI 

f  •  « 

JMP 

SIN  (i  +  j)A 

N  +  1 

LDI 

•  •  • 

JMP 

COS  (i  +  j)A 

LDI 

m 

30 

JMP 

SIN  (i  +  j)A 

SIN  <i  +  j)A 

SEH 

m 

30 

ADD 

m 

BI.  B2,  B3 

SijA 

LDA 

I  2 

SA 

84*  >•  J 

30 

NPJ 

SljA 

SijB 

FMP 

I  3 

CA 

60 

NPJ 

SijB 

STA 

1  2 

3  SAT 

(i  +  j)TEMP 

30 

NOP 

SijC 

LDA 

1 

3  SA 

30 

63 


Il 


Instruction 


Remarks 


Time  (jusec) 


SijD 


P 

P  +  1 


COS  (i  ♦  j)A 


NPJ 

SijC 

FMP 

1 

2 

CA 

60 

NPJ 

SijD 

SEH 

1 

2 

3  SAT 

30 

ADD 

1 

2 

3  'SAT 

STA 

1 

2 

3  SA 

30 

INB3 

2 

4 

STI 

g 

g  or  table 

30 

LDA 

P 

BGN 

g 

Bl,  B2,  B3 

30 

STI 

L 

g4>  J 

LDA 

P  +  l 

30 

BGN 

h 

HLT 

30 

NOP 

LDI 

«  •  • 

8 

JMP 

SIN  (x  +  jA) 

LDI 

•  •  • 

h 

JMP 

COS  (x  +  jA) 

LDI 

n 

30 

JMP 

COS  (i  -r  j)A 

Bl,  B2,  B3 

S4'  i 

SEH 

n 

30 

ADD 

n 

LDA 

1 

2 

SA 

SIN  iA 

30 

NPJ 

FMP 

l 

3  SA 

SIN  jA 

60 

NPJ 
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TABLE  III-6  -  ACTIVITY  FUNCTION  4  PROGRAM,  MACHINE  I  (Continued) 


Item 

Instruction 

Remarks 

Time  (*isec) 

STA  ] 

l  2 

3  CAT 

(i  +  j)TEMP 

30 

LDA  1 

i  2 

CA 

NPJ 

60 

I  FMP  1 

1 

3  CA 

NPJ 

30 

SEH  1  2 

3  CAT 

SUB  1  2 

3  CAT 

30 

STA  1  2 

3  CA 

INB3 

2 

4 

i  +  j 

30 

ENB2 

3  -1 

i  +  j  -  1 

LAC  1 

2 

-n 

30 

INA 

2 

3  4 

JNGA 

CA 

30 

JNZA 

HLT 

CA 

STI 

e 

30 

JJ)A 

M 

BGN 

c 

LDI _ ,  JMP,  SijA 

30 

STI 

i 

LDA 

m  +  1 

30 

BGN 

/ 

4 

LDI _ ,  JMP,  CijA 

INB2 

i 

30 

LAC  I 

l 

2 

-n 

I 

INA 

2 

3  4 

30 

JNGA 

CB 

JNZA 

30 

NOP 

CB 

STI 

e 

3C 

LDA 

M 

LDI _ .  JMP  SijA 

BGN 

e 

30 

*-A 
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TABLE  IU.6  -  ACTIVITY  FUNCTION  4  PROGRAM,  MACHINE  I  (Continued) 


Instruction 


Remarks 


Time  (fisec) 


LDA 

BGN 

HLT 

NOP 


m  +  1 


LDI  ,  JMP  CijA 


JMP 

SIN  (x  +  jA)  SEH 
ADD 


LDA 


FMP  1 


SIN  x  +  jA 


3  CA 


Bl,  B2,  B3 

g4.  o.  J 


COS  jA 


G4m 


STA  1 
LDA 
FMP  1 
SEH  1 
ADD  1 
STA  1 


INAL 


LDA 


BGN 

HLT 

NOP 


3  SaT 


3  SA 
3  SaT 
3  SaT 
3  sa 
GAM 


3  mteml 
GIM  +  1 


’  mteml 


SIN  x  +  jA 


START  MAX 
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TABLE  III -6  -  ACTIVITY  FUNCTION  4  PROGRAM,  MACHINE  I  (Continued) 


indices  transferred  to  the  SER  processors  are  g^,  3,  S a;  g^,  6,  Ca;  g^,  9, 
SA;  and  g^,  12,  CA;  g^  is  the  prefix  name;  3,  6,  9,  and  12  are  data  vector 
addresses,  relative  to  g^,  of  the  input  variables  a,  a  and  8  ;  Sor,  Ca, 
SA,  and  CA  are  initial  addresses  of  the  output  data  vectors. 

The  SER  subprogram  generates  one  set  of  indices  to  start  the  SIN  (i  +  j)A 
and  COS  (i  +  j)A  subprograms  and  also  generates  one  index  set  to  start  the 
calculation  of  sin  x  ♦  A  and  cos  x  +  A.  The  index  set  that  starts  the  SIN 
(i  +  j)A  and  COS  (i  +  j)A  subprograms  is  used  to  generate  the  tree  of  in¬ 
dices  that  are,  in  turn,  inputs  to  the  SIN  (i  +  j)A,  COS  (i  +  j)A,  SIN  (x  + 
jA),  and  COS  (x  -t  j A)  subprograms. 

The  indices  transferred  to  the  processor  executing  the  HSIN  subprogram 
are  erased,  and  x  or  A  is  then  normalized  mod  27.  The  normalized 
quantity  is  next  compared  to  the  four  quadrant  angles,  each  of  which  has 
stored  in  its  lower  two  bits  the  correct  signs  of  sines  and  cosines  of  angles 
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within  that  quadrant.  The  NHA  y  instruction  obtains  the  smallest  quadrant 
angle  larger  than  the  argument  in  the  accumulator  and  replaces  the  con¬ 
tents  of  the  accumulator.  The  sign  bits  are  ehift'ed  into  Q  and  stored. 

*  . .♦ 

Now  a  and  Q  are  computed,  and  the  indices  to  be  transferred  to  the  SER 
program  are  stored  prior  to  starting  the  four  SER  processing  units.  The 
SER  program  computes  sin  x,  cos  x,  sin  A,  and  cos  A  from  inputs  a,  a  , 
0,  and  o  using  the  Hastings  Sine  series.  The  processing  unit  that  com¬ 
putes  cos  A  generates  indices  and  starts  instructions  for  the  processors 
to  begin  executing  the  SIN  (i  +  j)A,  COS  (i  +  j)A,  SIN  (x  +  jA),  and  COS 
(x  +  jA)  subprograms. 

The  SIN  (i  +  j)A  subprogram  gets  its  input  indices  from  a  temporary  table, 
after  which  the  indices  are  erased.  The  product  sin  iA,  cos  jA  is  found 
and  added  to  sin  jA  cos  iA  and  stored.  Nonpresence  jump  instructions 
are  used  to  be  certain  that  the  words  fetched  from  memory  are  the  exact 
ones  zequested.  The  SIN  (i  +  j)A  subprogram  also  generates  indices, 
which  are  transferred  to  processors  assigned  to  the  SIN  (x  +  jA)  and  COS 
(x  +  jA)  subprograms. 

The  COS  (i  +  j)A  subprogram  gets  its  index  inputs  from  a  temporary  table 
in  a  similar  manner  to  the  SIN  (i  +  jA)  subprogram.  While  the  computa¬ 
tion  is  similar  to  that  for  the  SIN  (i  +  jA)  subprogram,  the  COS  (i  +  jA) 
program  generates  indices  that  are  used  by  the  SIN  (i  +  j)A  and  COS 
(i  +  j)A  subprograms  to  generate  mon  branches  of  the  tree  x  +  jA.  For 
each  i,  j  input,  2  sets  of  indices  are  generated:  j  *  i  +  j»  i  »j-l  and 
j  «  i  +  j,  i  »  j.  Hence  for  i  *  1,  the  inputs  are  j  =  1  and  the  output  eete 
are  i  *  1,  j  *  2  and  i  *  2,  j  =  2.  These  in  turn  generate  i  *  2,  j  *  3, 
i  =  3,  j  «  3  and  i  *  3  j  *  4,  i  »  4.  j  »  4. 

For  each  set  of  indices  sent  to  the  SIN  (i  +  j)A  subprogram,  the  sum  j  « 
i  +  j  is  sent  to  the  SIN  (x  ♦  jA)  and  COS  (x+  jA)  subprograms.  For  each 
additional  SIN  (x  +  jA)  subprogram  generated,  there  is  a  processing  unit 
assigned  to  the  maximisation  program. 
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—  Activity  Function  5 

The  flow  chart  for  the  gg(x)  activity  function  is  shown  in  Figure  III-5;  the 
program,  in  Table  II1-8;  and  the  data  vector  format,  in  Table  III -9.  The 
returns  calculated  from  this  function  are 


The  gg(x)  function  is  executed  by  transferring  the  address  of  the  gg  data 
vector  to  a  processing  unit  and  jumping  to  G5.  Index  registers  2  and  3 
are  initialized  with  0  ani  n;  xis  obtained  from  the  data  vector  and  stored 
in  (1)  a  temporary  word  at  the  end  of  the  data  vector  and  (2)  the  first 
word  of  the  output  portion  of  the  data  vector. 

The  iterative  portion  of  the  program  begins  by  fetching  and  erasing  the 
temporary  word  =  x,  adding  A  to  it,  and  storing  x  f  A  back  in  the  tem¬ 
porary  location.  The  incremented  value  in  A  is  now  doubled  and  tested. 
If  0  -  X  §  1,  then  the  doubled  value  is  stored  in  the  data  vec.or  at  the 
address  B1  +  B2  +  4.  If  1  ^  X  -  2,  then  4  -  2x  is  storea. 

Index  2  is  now  compared  to  index  3.  Index  2  carries  the  current  itera¬ 
tion  number,  while  index  3  carries  the  maximum  number  of  iterations 
to  be  performed.  If  the  maximum  has  not  been  exceeded,  the  program 
is  repeated  with  index  2  incremented.  If  th*  maximum  has  been  reached, 
the  processing  unit  halts. 

f.  Activity  Function  b 

The  flow  chart  for  activity  function  g^(x)  =  2(x)  is  shown  in  Figure  IiI-6; 
the  program,  in  Table  111-10;  and  the  data  vector  format,  in  Table  III-9, 
along  with  the  g^(x)  format. 

This  function  is  started  by  transferring  the  address  of  the  data  vector 
to  a  processing  unit  aud  jumping  to  Gb,  x  is  obtained  from  the  'irst  word 
of  the  vector  and  stored  in  a  temporary  loct tion.  if  x  is  less  than  1, 
zero  is  stored  in  the  data  vetuor;  if  x  is  equal  to  or  greater  than  1  but 
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TABLE  III -6  .  ACTIVITY  FUNCTION  5  PROGRAM#  MACHINE  I 


Item 

Instruction 

Remarks 

Time  (psec) 

ENB1 

*5 

30 

JNP 

G5 

G5 

ENB2 

4 

30 

LDB3 

i 

2 

LDA 

l 

4 

30 

STA 

1 

3 

3 

STA 

1 

3 

30 

NOP 

G5B 

SEH 

1 

3 

3 

30 

LDA 

l 

3 

3 

ADD 

1 

1 

30 

STA 

1 

3 

3 

ALS 

1 

30 

INA 

.2 

JNGA 

GSA 

< 

8 

30 

JNZA 

GSA 

8 

INA 

2 

l 

30 

STA 

1 

3 

4 

ENA 

4 

30 

SEH 

I 

3 

4 

SUB 

1 

4 

30 

STA 

I 

2 

3 

4 

G3C 

ISK2 

3 

4 

30 

JNP 

GSB 

HLT 

30 

NOP 

! 

GSA 

INA 

2 

i 

i 

30 

STA 

1 

2 

4 

i 

JMP 

CSC 

; 

30 

NOP 

72 
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TABLE  III- 10  -  ACTIVITY  FUNCTION  6  PROGRAM,  MACHINE  1 


Item 


Instruction 


Remarks 


ENB1 

*6 

JMP 

G6 

G6 

ENB2 

4 

LDB3 

1 

2 

LDA 

1 

4 

JMP 

G6E 

G6D 

LDA 

1 

3 

3 

ADD 

I 

1 

G6E 

STA 

1 

3 

3 

COM 

1 

JMP 

G6A 

LDA 

4 

G6B 

STA 

1 

2 

3 

LDA 

G6M 

INAL 

z 

3 

STA 

z 

MTEM2 

LDA 

GIM  +  1 

BGN 

MTEM  + 

ISK2 

3 

4 

JMP 

G6D 

HLT 

NOP 

G6A 

COM 

2 

JMP 

GiC 

ENA 

2 

JMP 

G6B 

G6C 

ENA 

4 

JMP 

G6B 

u3 

0 

0 

i 

• 

IMP  MAX 


Time  (psec) 
30 


-?5. 


1®**  than  2,  then  2  it  stored  in  the  data  vector;  and  if  x  is  equal  to  or 
greater  than  2,  then  4  is  stored  in  the  vector. 

For  each  value  stored  in  the  data  vector,  a  processor  is  started  on  the 
maximization  program.  The  resource  for  which  the  value  of  the  activity 
function  has  just  been  calculated  is  transferred  to  the  u^  maximization 
program.  This  enables  u^  to  begin  the  maximization  process  to  deter¬ 
mine  what  allocation  of  this  resource  will  result  in  the  maximum  return. 

g.  Maximization  Function 

The  flow  chart  for  the  maximization  function  is  shown  in  Figure  111-7; 
the  program,  in  Table  111-11;  and  the  data  vector  format,  in  Table  111-12. 

The  maximization  routine  is  started  by  storing  an  LDI  ,  JMP  MAX  in¬ 
struction  iin  the  jump  table.  The  address  of  the  LDI  instruction  is  the 
address  where  the  indices  to  be  transferred  are  stored.  These  addresses  - 
namely.  MTEM  +  B3,  MTEM  1  f  B3,  MTEM  2  +  B3,  MTFM  3  »  B3,  and 
MTEM  4  +  B3  -  contain,  respectively,  the  indices  for  execution  of  the 
maximization  for  function  Uj,  u^.  u y  u^  and  u^.  Index  register  1  is  used 
to  hold  address  u^,  u,,  u y  u^,  or  u^;  index  register  2  is  initially  zero; 
and  index  register  3  carries  j,  which  indicates  the  maximum  value  of  re¬ 
source  for  which  returns  are  currently  available.  As  .he  function  returns 
are  being  calculated,  processors  are  being  started  with  the  index  informa¬ 
tion  in  the  MTEM  tablet. 


Index  register  4  is  loaded  with  the  address  located  in  the  second  word  of 

► 

the  data  vector  whose  address  is  in  index  register  1.  The  contents  ox  in¬ 
dex  register  1  can  be  u^  u2>  u y  u^.  or  u5>  Correspondingly,  word  l  of 
these  data  vectors  contain  the  address  of  the  first  return  in  data  vector* 


«1*  8 y  «s.  «r  or  u4. 


Word  2  of  each  u  vector  contains  the  address  of 


the  first  return  in  data  vectors  g2>  g4  +  15,  g^,  u y  or  Uj.  Index  register 


5  is  loaded  witu  one  of  these  corresponding  addresses. 


The  contents  of  the  address  are  the  sum  of  the  contents  of  index  registers 
3  and  5.  If  this  address  is  present  in  the  memory,  the  contents  are  loaded 


Into  A. 


If  B5  *  g2  and  B3  *  5  «  j  ♦  3,  then  the  return  from  function 


*2 


-76- 


APPENDIX  Ul 


1 


Figure  III-?  -  Maximiaation  Function  Flow  Chart,  Machine  I 
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TABLE  III.  11  -  MAXIMIZATION  FUNCTION  PROGRAM,  MACHINE  I 


Item 

Instruction 

Remarks 

Time  (Msec) 

LDI 

30 

JMP 

MAX 

MAX 

LDB4 

1 

1 

Inputs 

30 

LDB5 

i 

2 

B  1,  B2  ,  B3 

LDA 

-  00 

Uj,  0,  j 

30 

STA 

1 

2  3 

3 

u2-  °'  J 

MAXA 

LDA 

3  5 

0,  .1 

30 

NPJ 

MAXA 

u4,  0,  j 

MAXC 

ADD 

2  4 

1 i 

P 

sjl 

o 

30 

NPJ 

MAXC 

1 

AND 

LS24 

30 

INA 

3 

4 

STA 

1 

2  3 

3 

30 

SEH 

1 

2  3 

3 

ADD 

1 

2  3 

3 

30 

BJP3 

MAXB 

t 

ENB3 

2 

it 

30 

| 

;  ENA 

u  2 

INA 

1 

30 

JNZ  A 

MAXE 

JMP 

MAXD 

IKJu^ 

30 

NOP 

! 

MAXE 

ENA 

u  3 

30 

INA 

1 

JNZ  A 

HLT 

30 

LDBl 

U  4 

DOu 

4 

ENB2 

i 

30 
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TABLE  III-ll  -  MAXIMIZATION  FUNCTION  PROGRAM.  MACHINE  I  (Continued 


Item 

J 

Instruction 

Remarks 

STI  3 

MTEM3 

LDA 

GIM  +  1  LDI  MTEM3  + 

B3,  JMP  MAX 

BGN  3 

MTEM3 

HLT 

NOP 

MAXD 

LDB1 

u5 

ENB2  j i 

STI  3 

MTEM4 

LDA 

GIM  +  1  LDI  MTEM4  + 

B3,  JMP  MAX 

BGN  3 

MTEM4 

HLT 

MAXB 

INB2 

JMP 

MAXA 

ML 

LDI 

JMP 

MAX 

MTEM 

U1 

0 

0 

j 

MTEMI 

u2 

0 

i 

0 

j 

MTEM2 

u3 

0 

0 

j 

MTEM3 

u4 

0 

0 

j 

MTEM4 

u5 

0 

0 

j 
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TABLE  in- 12  -  MAXIMIZATION  FUNCTION  DATA  VECTOR  FORMAT, 

MACHINE  I 


Function  Uj 

Function 

Function  Uj 

Function  u^ 

Function  u^ 

U1 

u2 

u3 

u 

u5 

«1 

*3 

*5 

U1 

U4 

*2 

g4  +  15 

g6 

u3 

u2 

ul(0):  yl(0) 

U2(0):  y2{0) 

u3(0):  y2(0) 

U4(0):  y4(0) 

u5(0):  y  5(0) 

ul(l):  yl(l) 

U2(l):  y2(l) 

U3(l):  y2(l) 

U4(l )’  y4(  1 ) 

u5(l):  y5(l ) 

UM2):  y  1(2) 

u2(2):  y2(2) 

u3(2):  y2(2) 

U4(2):  y4(2) 

u5(2):  y5(2) 

ul(3)‘  y  1  (3 ) 

u2(3>:  y2(3) 

U3(3):  y2(3) 

U4(3)‘  y4(3) 

u5(3):  y5(2) 

“l(4):  yl(4) 

u2(4):  y2(4) 

U3(4):  y2(4) 

u4(4):  y4(4) 

u5(4):  y  5(4) 

UM5):  yl(5) 

u2(5):  y2(S) 

u3(5):  y2(5, 

u4(5):  y4(5) 

u5(5):  y 5(5) 

UM6)!  y  1  (6 ) 

u2(6):  y2(6) 

U3(6):  y2(6) 

U4(6):  y4(6) 

U5 (6):  y 5(6 ) 

ul(7):  yl(7) 

u2{7):  y2(7) 

u3(7):  y2(7) 

U4(7 )'  y4(7) 

U5(7)‘  y 5(7 ) 

Ul(8):  yl(8) 

u2(8):  y2(8) 

u3(8)'  y2(8) 

u4(8):  y4(8) 

u5(8):  y 5(8 ) 

Ul(9):  y  1(9) 

U2(9)'  y 2(9 ) 

U3(9):  y2(9) 

U4(9):  y4(9) 

U5(9):  y5(9) 

Ul(10):  71(10) 

u2(10):  y2(  10) 

u3(10):  y  2  (10) 

u4(10)'  y4(10) 

U5(10):  v5(  10) 

V  1(1 1 )'  yl(ll) 

u2(U):  y2(l  1 ) 

U3(l  1 )'  y2(ll) 

u4(ll):  y4(  11) 

U5(  1 1 )'  y5(  11) 

U 1  ( 1 2 )'  yl(12) 

u2<  1 2 ):  y2(12) 

a3(12):  y2(  12) 

U4(12)'  y4(12) 

U5(12):  y 5(  12) 

ul(13);  y  1  ( 1 3 > 

u2(13):  y2(13) 

U3(13):  y2(13) 

U4(  13)*  y4(  13) 

u5(13):  y5(l3) 

Ul(14):  y  1  ( 1 4) 

u2(14):  y2(14) 

u3(14);  y2(l4) 

U4(14):  y4(l4) 

u5(14):  y5(l4) 

U1(15):  y l ( 1 5) 

u2(15):  y2(15) 

U3(15)‘  y2(l5) 

U4(15);  y4(  15) 

u5(15):  y5(l5) 

Ul(16):  yl(i6) 

U2(16):  y2(l6) 

u3(16):  y2(l6) 

U4(16):  y4(l6) 

U5(16):  y5(  16 ) 

U  U 1 7 ):  yl(17) 

u2(  17)'  y2(  17) 

u3(17):  y2(17) 

U4(17)'  y4(l7) 

u5(17):  y5(17) 

ul(18):  yi(18) 

u2(t8):  y2(l8) 

UJ(18);  y2(  18) 

U4(18):  y4(18) 

u5(  1 8 );  y5(l8) 

W1(19):  yl(  19) 

U2(19):  y2(19i 

u3(19)  V2(19) 

U4(19)‘  y4(l9) 

u5(19):  yS(t9) 

wl(20):  y  1  (20) 

u2(20):  y2  (20) 

UJ(20):  y2(20) 

u4(20)'  y4(20) 

u5(20):  y5(20) 
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for  a  resource  allocation  of  0.  2  is  loaded  into  A.  Similarly  the  return 
for  a  sero  resource  assignment  to  function  is  obtained  via  B4  and  B2 
and  added  to  the  previous  consents  of  the  accumulator.  The  least  signifi¬ 
cant  24  bits  of  the  sum  are  replaced  by  (B3)  =  j.  The  contents  of  A,  bits 
24  to  71,  are  np w  the  sum  of  the  returns  from  activity  function  g ^  and 
for  resource  assignments  of  0.  2  to  g£  and  0  to  gj.  The  J,  representing 
the  assignment  of  0.  2  units  to  is  stored  in  the  least  significant  24 bits; 
A  contents  are  stored  in  data  vectors  at  the  address  that  is  the  sum 
of  the  contents  of  index  registers  2  and  3  plus  the  contents  of  the  address 
position  of  the  instruction;  or  B1  +  B2  +  B3  +  3  =  +  0  +  2  +  3. 

For  a  resource  of  0.  2,  there  are  now  2  more  possible  allocation  sets: 

0.  1  to  #2  and  0.  1  to  gj,  and  0  to  g2  and  0.  2  to  g^.  Returns  for  these 
two  assignments  are  found  as  indicated  earlier,  and  they  are  stored  in 
words  with  the  same  address  (name).  For  example,  the  sum  of  the  g2 
and  gj  returns  for  a  resource  allocation  of  0.  2  to  g^  and  0  to  gj  is  stored 
in  a  word  with  an  address  equal  to  B1  +  B2  +  B3  +  3,  or  Uj  +  0  +  2  +  3. 

The  returns  for  other  permissible  combinations  of  resource  allocation  - 
such  as  0.  1  to  g2  and  0.  1  to  g^;  and  0  to  g2  and  0.  2  to  g^  -  are  stored  in 
memory  with  the  same  name.  The  name  is  derived  from  the  contents  of 
index  register  1  and  the  quantity  of  resource  that  is  to  be  allocated. 

Since  a  large  negative  value  was  stored  with  this  name  upon  entry  to  the 
maximisation  routine,  the  first  larger  entry  stored  with  this  name  will 
be  located  above  the  large  negative  value.  The  next  instruction  after  the 
store  (STA  123  3)  i*  a  single  erase  high  (SEH  123  3)  followed  by  a  fetch 
type  instruction.  The  result  of  this  execution  is  to  erase  the  word  where 
the  name  equal  to  the  contents  of  the  sum  of  Bi,  B2,  B3,  and  3  and  where 
magnitude  is  smallest.  In  this  case,  after  one  storage  instruction  two 
words  are  in  memory  have  the  same  name: 

Name  Data 

Uj  «•  5  some  ♦  value 

Uj  ♦  5  large  -  value 
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After  the  SEH  instruction,  the  word  with  the  large  negative  value  is 
erased.  With  this  sequence  of  instructions,  successive  values  of  a 
function  could  be  stored  in  the  memory;  and  after  a  machine  cycle,  one 
word  of  the  data  vector  could  be  erased,  always  leaving  the  largest 
value  in  the  vector  at  the  top  of  the  vector.  Index  register  3,  in  com¬ 
bination  with  index  register  2,  determines  the  combinations  of  allowable 
resource  allocations.  Initially,  index  3  has  the  maximum  current  re¬ 
source,  and  index  2  has  the  minimum  compatible  resource  to  be  allo¬ 
cated.  In  the  process  of  finding  the  maximum  return  for  a  given  re¬ 
source  assignment,  index  3  is  decremented  and  index  2  is  incremented; 
for  each  combination,  the  appropriate  activity  returns  are  found,  ?dded 
together,  and  stored  in  the  data  vector  where  the  smallest  element  of 
the  vector  is  erased  and  the  largest  element  of  the  vector  retained. 

Each  input  to  the  maximization  function  results  in  the  formation  of  a 
segment  of  one  of  the  maximized  return  vectors.  The  transfer  of  index 
data  u^ ,  0,  and  j  to  a  processing  unit  results  in  j  +  1  pieces  of  data  being 
stored  in  vector  u^,  all  with  names  Uj,  +  j.  When  the  calculation  of  the 
j  +  1  pieces  of  data  is  completed,  only  the  largest  is  retained,  and  it  is 
located  in  jth  position  of  the  vector. 

In  this  example  problem,  there  are  105  possible  combinations  of  inputs 
to  this  single  routine.  This  means  that  a  possible  105  processing  units 
are  operating  on  one  program.  Five  vectors  are  to  be  generated  -  u^, 
u,,  Uj,  u^,  Ug  -  and  j  can  range  between  0  and  20  in  steps  of  l. 

There  is  even  more  parallelism  in  this  section  of  the  problem.  Each 
possible  resource  combination  could  have  been  assigned  to  an  arithmetic 
unit.  In  this  sample  problem,  there  are  21  possible  resource  alloca¬ 
tions  to  the  S  vectors  for  a  total  of  1155  possible  combinations  of  re¬ 
source  to  be  maxtrrijred.  Each  combination  could  have  been  assigned 
to  a  processor  unit.  This  approach,  however,  would  entail  more  calcu¬ 
lation  to  specify  the  input  values  of  the  indices,  In  the  present  method, 
only  the  maximum  value,  j.  of  the  resource  is  transferred,  and  a  loop 
is  executed  *  t  1  iinv  «. 
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The  remainder  of  this  program  sets  up  the  indices  to  start  processing 
units  calculating  the  u^  and  Ug  vectors.  Whenever  work  is  being  done 
on  vectors  u^  or  Ug  and  the  largest  value  has  been  found  for  some  re¬ 
source,  j,  then  j  and  the  or  Ug  indices  are  transferred  to  a  new 
processing  unit  to  start  work  on  the  u^  or  the  Ug  vector. 

h.  Lookup  Function 

The  flow  chart  for  the  lookup  function  is  shown  in  Figure  III-8;  and  the 
program,  in  Table  III- 13. 

The  input  to  the  lookup  function  is  the  quantity  of  resource  x  to  be  allo¬ 
cated  to  the  6  activity  functions.  The  resource  is  used  in  combination 
with  the  name  of  a  maximized  data  vector  to  search  the  vector  for  the 
recommended  resource  assignment.  When  the  recommended  assign¬ 
ment  is  found,  the  remainder  is  assigned  to  the  next  maximized  vector; 
and  so  on. 

The  search  word  is  made  up  of  the  NHA  instruction  and  the  contents  of 
Bl.  which  in  turn  is  equal  to  the  name  of  the  data  vector  and  the  quantity 
of  resource  to  be  allocated.  This  search  word  is  used  to  find  the  ele¬ 
ment  of  the  vector  that  has  the  recommended  assignment  for  that  quanti¬ 
ty  of  resource.  The  recommended  assignments  are  stored  in  the  0  list 
of  Table  III- 13. 

3.  RESULTS 

a.  Activity  Functions  1  and  1 

The  timing  charts  for  the  gj(x)  and  the  g^fx)  activity  functions  are 
shown  in  Figures  III- 9  and  HI- 10,  respectively.  Since  these  functions 
were  relatively  simple  expressions,  it  was  felt  that  little  would  be 
gained  in  time  by  attempting  to  compute  their  various  values  in  paralltl. 
Hsncs,  one  processing  unit  was  tssignad  to  each  function,  and  an  itera¬ 
tive  program  was  written  that  evaluates  the  function  over  the  range  of 
the  argument.  Each  function  turns  out  a  ntw  value  of  the  function  about 
every  180  usee.  For  each  new  functional  value  calculated,  the  g^(x) 

.83. 


APPENDIX  UI 


TABLE  Ul-i3  -  LOOKUP  FUNCTION  PROGRAM.  MACHINE  X 
us  r’"i  f  aa  r-'-TssssBa^aasaga^^  "r^'TOrffi'nifflgaui  irnrar 


Item 


LOOKUP 


LUI 


Instruction 


JMP 

LOOKUP 

LDB1 

IN  ADD 

INBi 

R 

NI1A 

1 

ENB4 

LA 

STB4 

R  +  I 

LDA 

R 

SUB 

R  +  1 

STA 

R  +  3 

INB2 

I 

LDB1 

2 

IN  ADD 

INBI 

R  +  1 

NHA 

1 

i 

ENB4 

LA 

STB4 

R  +  2 

LDA 

R  +  l 

SUB 

R  f  2 

STA 

R  +  4 

ENB3 

2 

ENB5 

1 

INB2 

1 

LDB1 

2 

IN  ADD 

INBI 

5 

R  ♦  2 

NHA 

I 

i 

ENB4 

LA 

STB4 

3 

6  list 

LDA 

5 

R  ♦  2 

SUB 

3 

ft  iUt 

STA 

3 

8  liet  ♦  l 
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TABLE  III*  13  -  LOOKUP  FUNCTION  PROGRAM,  MACHINE  I 

(Continued) 


Item 


IN  ADD 


R 


8  LIST 


INB3 

ISK5 

JMP 

HLT 

NOP 

u5 

u„ 


u, 


u 


1 

X 

*5 

*4 

*  '  *5 

yS  '  y4 

*6 

s5 

84 

«3 

*2 
8 1 


Inetruction 


2 

3 

LU1 


y3 

y4  *  y3 
y2 

*  ’  y5  ’  y2 
yi 

y5  *  y6  ‘  yl 
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Figure  III- 9  -  Activity  Function  1  Timing  Chart,  Machine  I 
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Figure  III- 10  -  Activity  Function  2  Timing  Chart,  Machine  I 
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function  generates  rhe  start  irafructions  and  indices  for  the  maximiza¬ 
tion  program.  In  this  manner  maximization  continues  in  parallel  but  slight¬ 
ly  behind  the  activity  function  calculation.  The  maximization  program 
uses  as  its  inputs  the  functional  values  of  the  p^(x)  and  g-,(x)  programs, 
which  use  1  processor  each  and  calculate  in  about  3780  psec. 

b.  Activity  Function  3 

The  timing  chart  for  the  g^(x)  activity  function  is  shown  in  Figure  III -II. 
This  function  displays  more  parallelism  than  the  g^(x)  and  g2(x>  function. 
For  each  input  to  the  program,  three  processors  are  activated.  One 
processor  calculates  indices  to  maintain  the  treeing  operation,  one  exe¬ 
cutes  the  iterative  loop,  and  another  teste  the  exponent  of  .ha  floating 
joint  input  number.  The  number  of  processors  in  use  varies  from  1  to 
35,  and  total  time  is  about  1950  psec. 

The  treeing  operation  is  maintained  by  calculating  from  the  input  indices 
two  more  sets  of  indices,  which  in  turn  result  in  the  calculation  of  four 
sets  of  indices.  Each  level  of  index  calculation  results  in  a  number  of 
index  sets  equal  to  a  power  of  two.  The  quantity  oi  data  being  generated 
increases  exponentially,  while  the  time  to  generate  the  data  increases 
linearly. 

The  approximate  minimum  and  maximum  o  utput -data  times  at  the  second 
through  eighth  levels  of  indexing  are  shown  in  Table  111-14;  note  that  the 
minimum  increase  is  180  psec,  and  the  maximum  increase  300  psec. 
from  one  level  to  the  next.  It  is  possible  to  predict  the  calculation  of  a 
large  number  of  square  roots;  for  example,  256  in  a  time  equal  to  3090 
paec. 

c.  Activity  Function  4 

The  timing  chart  for  the  g^(x)  activity  function  is  shown  in  Figure  ill-12. 
This  function  computes  the  sines  and  cosines  of  42  angles  in  about  4300 
Msec,  using  a  maximum  of  26  processors.  Except  for  initial  calculations, 
four  processors  operate  on  each  value  of  the  input  argument.  Two 
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TABLE  III- 14  -  ACTIVITY  FUNCTION  3  MINIMUM  AND 
MAXIMUM  OUTPUT-DATA  TIMES.  MACHINE  I 


Output- data  time  (Msec) 

Level 

Minimum 

Maximum 

Data  elements 

2 

1170 

1290 

2 

3 

1350 

1590 

4 

4 

1530 

1890 

8 

5 

1710 

2190 

16 

6 

1890 

2490 

32 

7 

2070 

2790 

64 

8 

2250 

3090 

128 

processors  calculate  the  sine  and  cosine  of  the  incremental  angle;  two 
more  calculate  the  sine  and  cosine  of  the  base  angle,  plus  the  incre¬ 
mental  angle,  while  index  treeing  calculations  are  being  performed  by 
the  previous  processors. 

In  this  routine,  the  minimum  and  maximum  times  between  index  levels 
increase  at  a  linear  rate;  while  the  quantity  of  data  being  generated 
increases  exponentially  for  a  linear  increase  in  time. 

The  approximate  minimum  and  maximum  output-data  times  at  the  second 
through  fifth  levels  of  indexing  are  shown  in  Table  III- 15;  note  that  the 
minimum  increase  is  SlOjusec,  and  the  maximum  increase  690  Msec, 
from  one  level  to  the  next.  It  is  possible  to  generate  256  sets  of  sines 
and  cosines  in  about  6650  Msec. 

d.  Activity  Functions  5  and  6 

The  timing  charts  for  the  gj(x)  and  the  g^(x)  activity  functions  are  shown 
in  Figures  III- 13  and  III- 14,  respectively.  These  functions  are  similar 
to  the  gj(x)  and  g^fx)  function  in  that  one  processor  was  assigned  to  each 
function  because  of  the  low  inherent  parallelism. 
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TABLE  IU- 15  -  ACTIVITY  FUNCTION  4  MINIMUM  AND 
MAXIMUM  OUTPUT-DATA  TIMES,  MACHINE  I 


Output- data  time  (jutec) 


Level 

Minimum 

Maximum 

Data  elements 

2 

2340 

2500 

2 

3 

2850 

3200 

4 

4 

3360 

3890 

8 

5 

3870 

•  *  « 

About  450  msec  are  required  to  generate  a  new  element  of  the  output  data 
vector.  The  total  output  vector  for  gg(x)  is  available  at  4560ju*ac;  and 
for  g^(x),  at  4710  msec.  In  addition,  the  gg(x)  generates  indices  to  be 
transferred  to  the  maximization  program. 

£.  Maximization  Function 

The  data  flow  diagram  for  the  maximization  function  is  shown  in  Fig¬ 
ure  III- 15.  It  should  be  noted  that  the  inputs  to  the  maximization  func¬ 
tion  are  sets  of  indices  that  allow  generation  of  the  maximized  return 
vectors  u^  through  u^,  referred  to  here  as  maximization  programs.  The 
timing  chart  for  the  u^  maximization  program  is  shown  in  Figure  IU-16; 
for  the  u^  program,  in  Figure  III- 17;  for  the  program,  in  Figure  III- 18; 
for  the  Uj  program,  in  Figure  III- 19;  and  for  the  u^  program,  in  Fig¬ 
ure  III-20. 

The  Uj  program  of  the  maximization  function  gets  its  input  indices  from 
the  g^(x)  activity  function.  Each  input  allows  the  maximisation  program 
to  compute  all  possible  combinations  of  returns  from  g^(x)  and  for 
the  given  resource  to  determine  what  resource  allocation  will  give  the 
maximum  return.  Each  time  the  g^(x)  activity  function  generates  a  set 
of  indices  for  the  maximization  program,  a  processor  is  started,  As 
the  resource  to  be  maximised  increases,  the  time  required  to  maximise 
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Figure  Iil- 1 2  -  Activity  Function  5  Timing  Chart,  Machine  I 
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Figure  111*14  *  Activity  Function  6  Timing  Chart,  Machine  I 
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Figure  HI- 15  -  Maxim' aation  Programs  Data  Flow  Diagram,  Machine  I 

the  return  for  this  resource  increases;  for  the  program,  this  time 
anv-’uits  to  7500  /usee  (see  Figure  III- 16). 

The  a  v  ^  programs  acquire  -heir  inputs  from  the  g^(x)  and  (x) 
activity  functions,  respectively.  The  u^  and  programs  are  started 
from  the  and  u^  programs,  respectively. 

The  maximization  programs  are  started  as  soon  as  data  are  generated 
that  a  program  can  operate  on.  Che  dashed  lines  on  the  timing  chart 
(Figure  111-20)  represent  idle  (wheel  spinning)  time  whare  the  arithmetic 
un’t  is  looking  for  a  piece  of  data  yet  to  be  generated.  Until  the  data  are 
generated  by  the  u^  program,  the  u^  processor  is  not  constructively  use¬ 
ful. 

As  an  explanation  of  the  wheel  spinning  time,  t*  a  program  gen¬ 
erates  indices  to  start  the  program,  and  there  i*  a  disparity  between 
(l)  the  time  the  u-,  program  generate*  dr.ta  ior  the  sta*“t  of  and  (2)  the 
time  that  u^  has  corresponding  data  ready  for  u^.  A  solution  to  the 


*98  - 


0  o 


APPENDIX  HI 


quantity  or  RESOURCE  ASSIGNED 


APPENDIX  in 


TIME  IMiCWOUCOMOIt 


Figure  III*  17  *  Maximisation  Program  2  Timing  Chart  Machine  I 
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problem  would  be  to  let  generate  the  index  sets  for  u^.  Less  proc¬ 
essor  time  would  be  wasted,  although  there  would  be  a  small  amount 
of  time  at  the  beginning  of  u^  and  u^  where  data  was  available  but  no 
processor  operating  on  it. 

4.  COMPARISONS  AND  CONCLUSIONS 

The  processors  used  for  Machine  I  are  charted  in  Figure  III -21.  Ta¬ 
ble  III- 16  compares  the  IBM  7  090  sequential  (Appendix  II)  and  the 
Machine  I  parallel  execution  times  for  the  dynamic  programming  prob¬ 
lem. 

Although  the  Machine  I  execution  times  for  functions  g^(x),  g^(x),  g^(x), 
and  g^(x)  are  longer  than  for  the  7090,  it  should  be  realized  that  these 
routines  were  not  tree'd. 


TABLE  III- 16  -  COMPARISON  OF  IBM  7  090  AND  MACHINE  I 
EXECUTION  TIMES,  DYNAMIC  PROGRAMMING  PROBLEM 


7  090  sequeu  .ial  time  (jusec) 

Machine  I  paral¬ 
lel  time  (fisec) 

Function 

Minimum 

Maximum 

gL(x) 

276 

276 

3,  780 

g2(x) 

36  o 

872 

3.  780 

g3(x) 

3,  7 5b 

b,  623 

l,  980 

g^(x) 

2,  984 

3,  993 

4.  320 

682 

i,  185 

4,  560 

K0(*) 

335 

462 

4.  710 

Maximization 

140, 387 

208.  037 

16.  460 

Lookup 

383 

383 

510 

Total 

0.  15»  (*ec ) 

0  224  (sec) 

0.  016'  (sec ) 

Storage  required 

570 

i 

734 

-102  * 


OUANTI1V  OF  RESOURCE  ASSIGNED 


APPENDIX  III 


APPENDIX  III 


720  1440  2tb0  2  MO  3.600  4.320  5.040  3  760  6  460 

TIME  (MICROSECONDS) 


APPENDIX  III 


APPENDIX  III 


The  square  root  and  sine  routines  were  more  complex  and  exhibited 
pn  allelism  that  was  extracted.  In  addition,  these  two  functions  were 
tree'd.  The  sine  routine  generated  about  four  times  the  data  that  the 
same  routine  generated  on  the  sequential  machine. 

The  maximization  function  takes  significantly  less  time  with  Machine  1 
than  the  7090.  The  reason  is  that  the  Machine  I  memory  is  used  as  the 
maximizing  mechanism.  The  various  combinations  of  returns  are  cal¬ 
culated  and  then  sorted  by  the  memory  with  the  largest  floating  to  the 
top  and  being  retained. 

It  is  relatively  easy  with  Machine  I  to  use  a  large  number  of  processor 2 
on  even  a  small  problem.  There  is  a  significant  amount  of  parallelism 
in  many  problems  capable  of  computer  solution.  Any  iterative  sequence 
where  successive  loops  are  independent  can  be  assigned  to  a  group  of 
processors  for  parallel  execution.  Independent  programs  can  be  exe¬ 
cuted  in  parallel  and  can  in  turn  start  any  dependent  programs  as  suit¬ 
able  data  is  generated  keeping  active  processor  time  to  a  minimum. 

Machine  I  has  an  estimated  512  processing  units.  All  processors  were 
assumed  to  have  (1)  a  program  counter,  (2)  an  accumulator,  (3)  a  quo 
tient  register,  (4)  six  index  register.;,  and  (S)  an  instruction  register. 

Each  processor  has  the  capability  of  content  addressing  any  word  in 
memory  and  can  execute  interregister  transfers.  Data  can  be  trans¬ 
ferred  between  processors  only  via  the  memory.  Machine  I  fetches 
and  executes  two  instructions  in  30  pise c.  The  7090  requires  about 
8.  72p»ec  for  the  »*mc  operations.  Hence,  the  7090  is  3  44  faster 
than  Machine  1. 

The  total  time  for  iho  solution  of  the  dynamic  programming  problem  on 
the  7090  ranged  from  IS?  to  220  msec.  The  totsi  time  for  problem  so¬ 
lution  on  Mschiro  I  was  16  msec.  Hence,  the  Machine  I  solution  time 
was  9  to  14  times  faster  than  the  7090.  Furthermore  this  particular 
problem  required  a  maximum  of  only  60  of  the  available  512  proceesore 
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at  any  one  time.  An  average  of  only  22  proces  sors  was  used.  Had  the 
problem  been  large  enough  to  use  the  full  512  processors  at  the  peak 
period,  Machine  1  would  then  have  had  a  f  peed  advantage  of 

X  (9  -  14)  =  76  -  119  to  1  . 

In  addition,  many  processors  would  have  been  available  at  nonpeak 
times  for  other  uses  such  as  compiling. 


APPENDIX  IV  -  PROGRAMMING  MANUAL  FOR  MACHINE  I 


MACHINE  ORGANIZATION  FOR  MACHINE  I 

The  Machine  I  parallel  processor  organization  used  in  this  study  is  com¬ 
posed  of  512  logical  processing  units  and  32,768  words  of  memory.  Each 
processing  unit  ha3  the  following  registers:  program  counter,  instruction 
register,  accumulator,  quotient  register,  and  six  index  registers.  The 
program  counter  is  a  24-bit  register  that  is  stepped  sequentially  to  gener¬ 
ate  the  addresses  of  the  instructions  in  the  program. 

The  instruction  register  is  72  bits  long.  It  is  divided  into  two  36-bit  sec¬ 
tions,  upper  and  lower,  each  holding  a  single  36-bit  instruction.  In  most 
cases,  an  instruction  can  be  located  in  either  the  upper  or  lower  half  of 
an  instruction  word,  in  a  few  cases,  the  instruction  must  be  located  in 
the  upper  half  of  the  word. 

The  accumulator  is  a  ?2-bit  register  that  is  considered  as  one  register 
for  most  instructions;  in  some  cases,  it  can  be  treated  as  two  36-bit 
registers,  upper  and  lower. 

The  quotient  register  also  is  a  72-bit  register;  it  is  treated  either  as  one 
register  or  two,  depending  on  the  instruction  being  executed. 

Any  combination  of  three  of  the  six  index  registers  included  in  each  proc¬ 
essing  unit  may  be  used.  Their  contents  may  be  added  together  with  the 
contents  of  the  address  field  of  the  instruction  to  obtain  an  operand  ad¬ 
dress,  an  operand,  or  a  shift  count. 

If  the  indices  are  1,  2,  and  3,  there  is  no  reduction  in  the  sice  of  the  ad¬ 
dress  field  of  the  instruction.  If  the  indices  to  be  added  include  4,  5,  or 
6,  then  the  address  field  is  reduced  by  6  bits  and  bit  28  is  set  to  1.  The 
h-bit  reduction  still  leaves  an  address  field  of  18  bits.  The  original  3-bit 
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index  designator  field  plus  two  more  3 -bit  fields  are  used  to  contain  the 
3 -bit  codes  of  the  specified  index  registers. 

2.  INSTRUCTIONS 

a.  Instruction  Classes 

The  Machine  I  parallel  processor  instructions  may  be  grouped  in  four 
classes. 

Class  1  instructions  treat  the  contents  of  the  address  field  as  the  op¬ 
erand.  These  instructions  can  be  .  dexed  with  any  combination  of 
three  index  registers.  The  operand  is  the  result  of  the  addition  of 
the  contents  of  the  specified  index  registers  and  the  contents  of  the 
address  field  of  the  instruction. 

Class  2  instructions  treat  the  contents  of  the  address  field  as  an  op¬ 
erand  address.  These  instructions  can  be  indexed  with  any  combina¬ 
tion  of  three  index  registers.  The  final  operand  address  is  the  result 
A  the  addition  of  the  contents  of  the  specified  index  registers  and  the 
contents  of  the  address  field  of  the  instruction.  The  word  received 
by  the  processor  after  a  fetch  type  operation  is  the  word  next  higher 
than  the  request  word.  When  a  processor  is  requesting  a  word  that 
may  or  may  not  be  currently  available  in  memory,  an  NPJ  instruc¬ 
tion  following  the  fetch  will  enable  a  comparison  of  the  names  of  the 
requested  and  received  words  for  an  exact  match.  If  they  match, 
the  program  continues;  if  they  do  not.  then  a  jump  is  made  to  the 
fetch  instruction.  A  number  of  desirable  operations  in  this  class 
require  execution  of  two  instructions.  This  subclass  is  composed 
of  single-  and  multiple -erase  instructions  that  require  a  fetch  type 
instruction  following  the  erase  instruction. 

Class  3  instructions  treat  the  contents  of  the  address  field  as  a  shift 
count.  These  instructions  can  be  indexed  with  any  combination  of 
three  index  registers.  The  final  shift  count  is  the  result  of  the 
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addition  of  the  contents  of  the  specified  index  registers  and  the  con¬ 
tents  of  the  address  field  of  the  instruction. 

Class  4  instructions  are  special  instructions  that  allow  processor 
unit  interregister  transfers. 

b.  Instructions 

ENA  Bn  y  Enter  A  Lower 

The  24-bit  operand  Y  is  entered  into  A  lower.  The 
most  significant  bit  of  Y  iu  extended  in  A.  Bits  25 
to  72  of  A  are  replaced  with  bit  24  of  the  operand  Y. 
Y=y  +  Bn.  Yis  the  sum  of  y  and  the  contents  of 
any  combination  of  three  index  registers. 

ENAU  Bn  y  Enter  A  Upper 

The  24 -bit  operand  Y  (Y  =  y  +  Bn)  is  entered  into 
A  upper,  bits  37  to  60.  The  most  significant  bit  of 
Y  is  extended  in  A  upper,  bits  61  to  72.  The  con¬ 
tents  of  A  lower  are  unchanged. 

ENQ  Bn  y  Enter  Q  Lower 

The  24-bit  operand  Y  (Y  =  y  +  Bn)  is  entered  into 
Q  lower.  Bits  25  to  72  of  Q  are  replaced  with  bit 
24  of  the  operand  Y. 

ENBX  Bn  y  Enter  Index  Register  X 

The  24 -bit  operand  Y  (Y  =  y  +  Bn)  is  entered  into 
the  specified  index  register,  X,  with  X  =  1,  2,  3, 

4,  5,  6.  This  instruction  permits  the  transfer  of 
data  from  the  address  field  of  the  instruction  to  the 
specified  index  register  or  the  transfer  of  the  con¬ 
tents  of  any  group  of  index  registers  with  y  added 
or  not  to  the  specified  index  register. 
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INA 


1NAU 


INAE 


Example  1: 

(Bl)  =  2 

(52)  =  3  Bn  =  (B2)  +  (B3) 

<B3)  =  -4 
ENB1  B2,  B3  6 

The  contents  of  B1  are  replaced  by  5: 

<B2)  +  (B3)  +  y  =  3-4  +  6  =  5. 

Example  2: 

ENB1  Bl,  B2,  B3  6 

The  contents  of  Bl  are  replaced  by  7: 

(Bl)  +  (B2)  +  (B3)  +  y  =  2  +  3«4  +  6  =  7. 

Bny  Increase  A  Lower 

The  24-bit  operand  Y  (Y  =  y  +  Bn)  is  added  to  the 
least  significant  24  bits  of  A  lower.  No  addition 
takes  place  beyond  bit  24  of  A.  The  contents  of 
any  combination  of  three  index  registers  can  be 
added  to  the  A  register,  bits  1  to  24. 

Bny  Increase  A  Upper 

The  24-bit  operand  Y  (Y  =  y  +  Bn)  is  added  to  the 
least  significant  24  bits  of  A  upper,  bits  37  to  >0. 
No  addition  takes  place  beyond  bit  60  of  A. 

Bny  Increase  A  Lower  with  Extended  Addition 

The  24 -bit  operand  Y  (Y  »  y  +  Bn)  is  added  to  the 
contents  of  the  A  register.  If  the  contents  of  the  A 
register  lower  is  an  instruction  with  an  address  in 
its  address  field,  an  XNAE  instruction  may  result 
in  alteration  o’  -  instruction  operation  code  and 
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INAUE 


INBX 


ISKBX 


tag  field.  Similar  changes  may  occur  in  A  upper 
if  a  carry  propagates  into  A  upper. 

Bny  Increase  A  Upper  with  Extended  Addition 

The  24-bit  operand  Y  {Y  =  y  +  Bn)  is  added  to  the 
contents  of  A  upper.  The  restrictions  on  INAE 
also  apply  to  INAUE. 

Bny  Increase  Index  Register  X 

The  24-bit  operand  Y  (Y  -  y  +  Bn)  is  added  to  the 
contents  of  the  specified  index  register,  X,  with 
X  =  1,  2,  3,  4,  5,  6. 


Example  1: 

<B1)  =  2 
(B2)  =  3 
(B3)  =  -4 
INB1  B2,  B3  6 

The  contents  of  B1  are  increased  by  5; 

Y  =  (B2)  +  (B3)  +  y  =  3-4  +  6  =  5. 

Example  2: 

INB1  Bl,  B2,  B3  6 

The  contents  of  BJ.  are  increased  by  7: 

Y  =  (Bl)  +  (B2)  +  (B3)  +  6  =  2  +  3  -  4  +  6  =  7, 
Bny  Index  X  Skip 

X  *  1,  2,  3,  4,  5.  6 

The  contents  of  the  specified  index  register,  X, 
with  X  s  l,  2,  3,  4,  5,  6,  are  compared  with  Y. 

If  the  two  quantities  are  equal,  the  specified  index 
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1 


LDA 


LAC 


LDQ 


L17BX 


register  is  cleared  and  a  full  exit  is  performed.  If 
the  quantities  are  unequal,  the  content*  of  the  speci¬ 
fied  index  register  are  increased  by  one,  and  a  half 
exit  is  performed.  Normally,  this  instruction  oc¬ 
cupies  the  upper  half  of  an  instruction  word.  A  half 
exit  then  results  in  execution  of  the  instruction  in 
the  lower  half  of  the  instruction  word.  A  full  exit 
is  accomplished  by  incrementing  the  program 
counter  by  1  and  executing  the  upper  instruction 
of  this  new  instruction  word. 

BnM  Load  A 

The  contents  of  A  are  replaced  by  the  72 -bit  operand 
contained  in  storage  location  M,  with  M  =  (m  +  (Bn)). 
M,  the  operand  address,  is  obtained  by  adding  the 
contents  of  the  indicated  index  registers  to  m. 

Bnm  Load  A  Complement 

The  contents  of  A  are  replaced  by  the  complement 
of  the  72  -  bit  operand  contained  in  storage  location 
M,  with  M  =  (m  +  (Bn)). 

Bnm  Load  Q 

The  contents  of  Q  are  replaced  by  the  72 -bit  operand 
contained  in  storage  location  M,  with  M  =  (m  +  (Bn)). 

Bnm  Load  Index  Register  X 

The  contents  of  the  specified  index  register  X,  with 
X  *  1,  2,  3,  4,  5,  6,  are  replaced  by  the  least  sig¬ 
nificant  24  bits  of  M,  with  M  =  (m  +  (Bn) ). 

Example: 

(Bl)  *  2 

(B2)  *  3 
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(B3)  *  -4 
0006 

0007  00000000000000000000 1013 

0010 

LDB1  Bl,  B2,  B3  6 

The  contents  of  Bl  ere  00001013. 

M  *  (m  +  (Bl)  +  (B2)  +  (B3) )  *  (6  +  2  +  3  -  4)  *  (7)  . 

ADD  Bnm  Add 

The  72 -bit  operand  contained  in  location  ’4  is  aided 
to  the  contents  of  the  A  register.  M  =  (m  +  (Bn) ). 

SUB  Bnm  Subtract 

The  72 -bit  operand  in  location  M  is  subtracted  from 
the  contents  of  A.  M  =  (m  +  (Bn  ) ). 

MLY  Bnm  Multiply 

The  contents  of  storage  location  M  are  multiplied 
by  the  contents  of  the  A  register.  The  144-bit 
product  is  contained  in  AQ. 

DVD  Bnm  Divide 

The  contents  of  AQ  are  divided  by  the  contents  of 
storage  location  M.  The  quotient  is  left  in  A  and 
the  remainder  in  Q. 

FAD  Bnm  Floating  Add 

The  sum  of  two  72 -bit  floating  point  operands  is 
formed.  The  floating  point  operand  in  M  is  added 
to  the  floating  point  operand  in  A.  The  result  is 
normalised  and  rounded. 
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1 


FSB 


FMP 


FDV 


*STA 


Bnm  Floating  Subtract 

The  difference  of  two  72 -bit  floating  point  operands 
is  formed.  The  contents  of  storage  address  M  are 
subtracted  from  the  contents  of  A.  The  result  is 
rounded  and  normalized. 

Bnm  Floating  Multiply 

The  floating  point  contents  of  storage  location  M 
are  multiplied  by  the  floating  points  contents  of  the 
A  register.  The  product  is  rounded  and  normalized 
in  A. 

Bnm  Floating  Divide 

The  floating  point  contents  of  A  are  divided  by  the 
floating  point  contents  of  storage  location  M.  The 
floating  point  quotient  is  retained  in  A. 

Bnm  Store  A 


The  asterisk  indicates  a  sig¬ 
nificantly  new  instruction. 

The  contents  of  the  A  register  are  re-created  in  a 
memory  word  with  address  M.  Every  store  in¬ 
struction  results  in  the  creation  of  a  new  word  in 
memory  with  an  address  of  M.  It  is  possible  that 
a  numbei  of  words  in  memory  may  have  the  same 
address  and  bt>  ordered  according  to  the  contents 
of  their  data  fields.  If  this  situation  cannot  be  tol¬ 
erated  then  (1)  care  should  be  exercised  to  assign 
each  word  stored  ;n  a  unique  address  or  (2)  all 
words  in  memory  with  this  address  should  be 
er@*ted  before  the  new  word  is  created.  The 
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ability  to  store  a  number  of  words  with  identical 
addresses  is  desirable  since  after  the  next  machine 
cycle  the  vector  of  words  will  be  sorted  according 
to  their  data  fields,  smallest  to  the  largest. 

*STQ  Bnm  Store  Q 

The  contents  of  the  Q  register  are  re-created  in  a 
memory  word  with  address  M  (see  +STA  instruc¬ 
tion  above). 

*STBX  Bnm  Store  Index  1 

X  =  1,  2,  3.  4,  5,  6 

The  contents  of  the  specified  index  register,  X, 
with  X  *  1,  2,  3,  4,  5,  6,  are  re-created  in  a 
memory  word  with  address  M.  The  24-bit  con¬ 
tents  of  the  index  register  are  stored  in  the  least 
significant  24  bits  of  the  lower  portion  of  the  word. 

Example: 

<B  1)  -  2 

(B2)  =  5 

(B3)  =  -4 

STB l  5 ! ,  5?.  B3  6 

The  contents  of  index  register  )  are  stored  in 
location  7: 

0007  000000000000000^00000002 

If  there  had  been  other  word*  in  memory  with 
an  address  of  uw *  uifn  ms  * created  word 
may  have  been  another  element  of  the  vector  of 
words  with  address  0007 : 

0007  ooooooot  oooonoooooooor* 
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0007  000000000000000000000002 

0007  000000000000000000000006 

ARS  B1^  A  Right  Shift 

The  contents  of  A  are  shifted  right  K  places ,  with 
K  *  k  +  (Bn).  The  sign  is  extended  and  the  lower 
bits  discarded. 

QRS  B^  Q  Right  Shift 

The  contents  of  Q  are  shifted  right  K  places.  The 
sign  is  extended  and  the  lower  bits  discarded.  K  = 
k  +  (Bn) 

LRS  Bnk  Long  Right  Shift 

The  contents  of  AQ  are  shifted  right  K  places.  The 
sign  of  A  is  extended  and  the  lower-crder  bits  of  A 
replace  the  higher  order  bits  of  Q.  The  !  iwer  order 
bits  of  Q  are  discarded.  K  =  k  +  (Bn) 

ALS  B1^  A  Left  Shift 

The  contents  of  A  are  shifted  left  circular  K  places. 
The  higher-order  bits  of  A  replace  the  lower  order 
bits.  K  =  k  +  (Bn) 

QLS  Q  Left  Shift 

The  contents  of  Q  are  shifted  left  circular  K  places. 
The  higher  order  bits  of  Q  replace  the  lower  order 
bits.  K  =  k  +  (Bn) 

LLS  Bnk  Long  Left  Shift 

The  contents  of  AQ  are  shifted  left  circular  K  places. 
The  higher-order  bits  of  A  replace  the  lower-order 
bits  of  Q.  The  higher -order  bits  of  Q  replace  the 
lower  order  bits  of  A.  K  =  k  +  (Bn) 
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SCA  Bnk  Scale  A 

The  contents  of  A  are  shifted  left  circularly  until 
the  most  significant  bit  is  to  the  right  of  the  sign 
bit  or  until  k  =  0.  Shift  count  k  is  reduced  by  one 
for  each  shift  and  terminates  when  k  =  0,  or  when 
the  most  significant  bit  is  to  the  right  of  the  sign 
bit.  Upon  termination,  the  count  (scale  factor)  is 
entered  in  the  specified  order  register.  K  =  k  + 
(Bn) 

SCQ  Bnk  Scale  Q 

The  contents  of  AQ  are  shifted  left  circularly  until 
the  most  significant  bit  is  to  the  right  of  the  sign 
bit  of  A.  Shift  count  k  is  reduced  by  one  for  each 
shift.  The  operation  terminates  when  k  »  0  or 
when  the  most  significant  bit  is  to  the  right  of  the 
sign  bit.  Upon  termination  the  count  (scale  factor) 
is  entered  in  the  specified  index  register.  K  =  k  + 
(Bn) 

AND  Bnm  Logical  AND 

The  contents  of  A  are  replaced  by  the  logical  AND 
of  Q  and  the  contents  of  M.  M  =  m  +  (Bn) 

OR  Bnm  Logical  OR 

The  contents  of  A  are  replaced  by  th*.  logical  OR  of 
Q  and  the  contents  of  M.  M  =  m  +  (Bn) 

EOR  Bnm  Exclusive  OR 

The  contents  of  A  are  replaced  by  the  exclusive  OR 
of  the  contents  of  Q  and  the  contents  of  M.  M  = 
m  +  (Bn) 
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JMP 


JNGA 


JNZA 


JNGQ 


JNZQ 


Bnm  Jump 

Program  control  is  transferred  to  location  M. 
Normally,  the  program  counter  is  incremented 
by  one  for  each  instruction  word  executed.  In 
the  case  of  a  jump  instruction,  the  contents  of 
the  program  counter  are  replaced  by  M  and  the 
instruction  at  this  address  is  executed  next,  M  = 
m  +  (Bn) 

Bnm  Jump  if  A  Is  Negative 

Program  control  is  transferred  to  location  M  if 
the  sign  of  A  is  negative.  If  the  sign  of  A  is  posi¬ 
tive  the  next  sequential  instruction  is  executed. 

M  =  m  +  (Bn) 
n 

B  m  Jump  if  A  Is  Nonzero 

Program  control  is  transferred  to  location  M  if 
the  contents  of  A  are  not  zero.  If  the  contents  of 
A  are  zero,  the  next  sequential  instruction  is  exe¬ 
cuted,  M  =  m  +  (Bn) 

Bnm  Jump  if  Q  Is  Negative 

Program  control  is  transferred  to  location  M  if 
the  contents  of  Q  are  negative.  If  the  contents  of 
Q  are  positive,  the  next  sequential  instruction  is 
erecut-'d. 

Bnm  Jump  if  Q  Is  Nonzero 

Program  control  is  transferred  tj  location  M  if 
the  contents  of  Q  are  not  zero.  If  the  contents  of 
Q  are  zero,  the  next  sequential  instruction  is  exe¬ 
cuted. 
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COM 


BJPX 


*NPJ 


*LDBX 


*ADBX 


Bnm  Compare 

The  contents  of  A  are  compared  with  the  contents 
of  M.  If  (A)  are  equal  to  or  greater  than  (M),  a 
half  exit  is  performed.  If  (A)  are  less  than  (M), 
a  full  exit  is  performed.  The  compare  instruction 
normally  is  an  upper  instruction. 

Bnm  Ii.dex  X  Jump 

If  the  contents  of  the  specified  index  register  X, 
with  X  =  1,  2,  3,  4,  5,  6,  are  not  zero,  the 
quantity  is  reduced  by  one  and  a  jump  is  executed 
to  location  M.  If  the  contents  of  the  specified  in¬ 
dex  register  are  zero,  the  next  sequential  instruc¬ 
tion  is  executed.  M  =  m  +  (Bn) 

Bnm  Nonpresence  Jump 

A  jump  to  location  M  is  executed  if  the  address  of 
the  word  obtained  from  storage  is  not  the  same  as 
the  operand  address  in  the  instruction.  M  =  m  + 


(Bn) 

UA 

Load  Index  X  Upper  A 

LA 

Lower  A 

UQ 

Upper  Q 

LQ 

Lower  Q 

The  least 

significant  24  bits  of  the  upper  or  lower 

halves  of  A  or  Q  are  loaded  into  the 

specified  index 

register, 

X,  with  X  =  1,  2,  3,  4,  5 

,  6. 

UA 

Store  Index  X  Upper  A 

LA 

Lower  A 

UQ 

Upper  Q 

LQ 

Lower  Q 
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*SEH 

FETCH 


*SEA 

FETCH 

*SEQ 

FETCH 


*MEH 

FETCH 


*MEA 

FETCH 

*MEQ 

FETCH 


The  contents  of  the  specified  index  register,  X, 
with  X  -  1,  2,  3,  4,  5,  6,  are  stored  in  the  least 
significant  24  bits  of  the  upper  or  lower  half  of  A 
or  Q. 

a  Single  Erase  High 

1 3  Fetch  type  instruction 

The  contents  of  the  location  which  is  next  larger 
than  the  lower  limit  word  /3  are  fetched  and  then 
erased  from  memory,  a  and  0  are  the  upper  and 
lower  limit  words  that  bracket  a  block  of  data  in 
memory.  The  data  field  of  word  a  is  maximum 
positive  while  the  data  field  of  word  0  is  aero.  The 
word  fetched  and  erased  is  the  word  next  larger 
than  the  lower  limit  word  0. 

0  l  0  a  =  (m  +  (Bn) ),  0  =  (m  +  (Bn)  ) 

Single  Erase  A 

Single  Erase  Q 

The  upper  limit  word  a  has  its  contents  set  to  the 
contents  of  A  or  Q.  The  operation  is  the  same  as 
for  SEH. 

Multiple  Erase  High 

The  execution  of  this  instruction  is  the  same  aa  for 
SEH  except  that  all  words  between  the  limits  a  and 
0  are  erased. 

Multiple  Erase  A 
Multiple  Erase  Q 

The  execution  of  these  instruction  is  the  same  as 


-124  ■ 


APPENDIX  IV 


♦SEHA 

FETCH 

♦SEHQ 

FETCH 


♦  MEH  A 
FETCH 

♦  MEAQ 
FETCH 


♦SEA  A 
FETCH 

♦SEQQ 

FETCH 


♦SEQA 

FETCH 

♦S2AQ 

FETCH 


♦  MEAA 
FETCH 


for  SEA  and  SEQ  except  that  all  words  between  the 
limits  a  and  /3  are  erased. 

Single  Erase  High  A 
Single  Erase  High  Q 

The  execution  of  these  instructions  is  the  same  as 
for  SEH  except  that  the  lower  limit  word  is  the 
same  as  the  contents  of  A  or  Q. 

Multiple  Erase  High  A 
Multiple  Erase  High  Q 

The  execution  of  these  instructions  is  the  same  as 
for  MEH  except  that  the.  lower  limit  word  is  the 
same  as  the  contents  of  A  or  Q. 

Single  Erase  AA 
Single  Erase  QQ 

The  execution  of  these  instructions  is  the  same  as 
for  SEA  except  that  the  lower  limit  word  is  the 
same  as  the  contents  of  A  or  Q. 

Single  Erase  between  limits  QA 

Single  Erase  between  limits  .t.J 

The  execution  of  these  instructions  is  the  same  as 
for  a  SEH  except  that  the  upper  limit  word  a  is  the 
same  as  the  contents  of  Q  or  A  and  the  lower  limit 
word  /9  is  the  same  as  the  contents  of  A  or  Q. 

Multiple  Erase  between  limits  AA 
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*MEQQ  Multiple  Erase  between  Limits  QQ 
FETCH 

The  execution  of  these  instructions  is  the  same  as 
for  a  SEAA,  SEQQ  except  that  all  words  are  erased 
between  the  limit  words. 

*MEQA  Multiple  Erase  between  Limits  QA 
FETCH 

*MKAQ  Multiple  Erase  between  Limits  AQ 
FETCH 

The  execution  of  the  instructions  is  the  same  as 
for  SEQA,  SEAQ  except  that  all  words  are  erased 
between  the  limit  words. 


*NHA  Next  Higher  than  A 

FETCH 

*NHQ  Next  Higher  than  Q 

FETCH 

The  word  next  higher  than  the  lower  limit  word  /3 
with  contents  equal  to  contents  of  A  or  Q  is  fetched 
from  memory. 

*BGN  Bny  Begin  Y 

Y  is  added  to  (A)  upper.  The  contents  of  A  are  then 
stored  in  the  WAIT  LIST.  This  instruction  is  used 
to  start  arithmetic  units.  Prior  to  the  BGN  instruc¬ 
tion,  A  has  been  loaded  with  an  LDI  ,  JMP 
PROG  instruction.  Execution  of  the  BGN  results 
in  the  addition  of  Y  to  the  LDI  instruction  and  the 
storage  of  this  instruction  pair  in  the  WAIT  LIST. 
The  address  that  is  added  is  the  address  of  the 
location  where  the  indices  to  be  transferred  are 
stored. 


3.  NUMBER  REPRESENTATION 

A  fixed  point  number  consists  of  a  sign  bit  and  coefficient.  The  upper  bit 


-126- 


APPENDIX  IV 


of  a  fixed  point  number  designates  the  sign  of  the  coefficient.  If  bit  71 
is  1,  the  quantity  is  negative;  a  0  sign  bit  signifies  a  positive  coefficient. 
The  coefficient  may  be  an  integer  or  fraction.  The  binary  point:,  in  the 
case  of  an  integer,  is  assumed  to  be  immediately  to  the  right  of  the  low¬ 
est  order  bit;  for  a  fraction,  the  point  is  put  to  the  right  of  the  sign  bit. 

Floating  point  numbers  are  represented  by  a  coefficient  and  an  exponent. 
The  coefficient  consists  of  a  60-bit  fraction  in  the  lower  60  positions  of 
the  floating  point  word.  The  coefficient  is  a  normalized  fraction  equal 
to  or  greater  than  l/2  but  less  than  1.  The  highest  order  position,  bit 
71,  is  the  sign  of  the  coefficient.  If  the  sign  bit  is  0,  the  coefficient  is 
positive;  if  the  sign  bit  is  1,  the  fraction  is  negative  and  in  ONE’S  com¬ 
plement  form. 

The  floating  point  exponent  is  an  11 -bit  quantity  with  a  value  ranging  from 
0000  to  3777  .  It  is  formed  by  adding  a  true  positive  exponent  and  a  bias 

O  O 

of  2000-  or  a  true  negative  exponent  and  a  bias  of  1777p.  This  results  in 

o  o 

a  range  of  biased  exponents  as  shown  below. 


ree  positive 
exponent 

Biased 

exponent 

True  negative 
exponent 

Biased 

exponent 

+0 

2000 

-0 

2000 

+  1 

2001 

-1 

1776 

+2 

2002 

-2 

1775 

+  1776 

3776 

-1776 

0001 

+1777s 

3777 

-1777 

0000 
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APPENDIX  V  -  BI-TONIC  SORTING 


1.  INTRODUCTION 

In  a  previous  company  report,  a  a  new  internal  sorting  method  was  dis¬ 
cussed.  Discussed  here  is  another  method  that,  while  not  as  efficient  as 
the  referenced  method,  has  certain  advantages  in  parallel  processors.  To 

sort  2n  words,  the  method  presented  here  (bi-tonic  sorting)  requires 
n  2 

n(n  +  l)2n  "  6  comparisons  (with  exchanges)  while  that  of  the  referenced 

2  n  -  2 

method  requires  only  (n  -  n  +  4)2  "  -  1  comparisons. 


2.  BI-TONIC  SEQUENCES 
Definition:  If 


A 


2' 


and 

B  =  b,,  b, . b, 

are  sequences  of  numbers,  then  B  is  a  circular  permutation  of  A  if  and 
only  if  there  exists  an  integer,  k,  with  0-k^n-l,  so  that 

b.  *  a.  . 

»  l  +  k 

for  ail  i  1  a  satisfying 

.  <  •  <  . 

1  *  t  s  n  -  k  , 

and 

bi  ■  *t  ♦  k  -  n 


aGER-ll759'  A  New  Internal  Sorting  Method.  Akron,  Ohio,  Goodyear  Aero¬ 
space  Corporation,  29  September  1964. 
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for  all  i's  satisfying 


Definition: 


n-k+l-i-n 


A  =  a2, 


is  said  to  be  a  bi-tonic  sequence  if  there  exists  a  circular  permutation  of 
A,  B  -  bj,  b2>  .  .  .  £nd  an  integer  i,  with  1  -  i  -  n,  so  that 


bl  "  b2  "  * 


-  bi-i 


*  b.  *  b. 


i  +  1 


^  b  .  ^  b 
n  -  1  n 


It  is  easy  to  see  that  any  monotonic  sequence  is  bi-tonic,  as  is  as  the  con¬ 
catenation  of  any  ascending  sequence  with  any  descending  sequence. 


Theorem  1:  Any  subsequence  of  a  bi  -tonic  sequence  is  bi-tonic.  Proof: 
Let  A  be  any  bi -tonic  sequence  and  A'  any  subsequence  of  A.  The  theorem 
need  only  be  proved  for  any  circular  permutation  of  A.  Letting  B  =  b^, 
b7,  .  .  .  ,  b  be  a  circular  permutation  of  A  where 

b  n 


*  b. 


i  +  1 


* 


b 

n 


it  can  be  seen  that  any  subsequence  of  B  is  bi-tonic.  This  concludes  the 
proof. 

Definition:  If 


A  *  a.,  a? . a  , 

1  <  pq 

where  p  and  q  are  integers  ar.d  1  -  i  ^  p,  then  the  sequence  A. 
defined 


P 


is 


Ai.  p  *  V  4i  ♦  p‘  *i  ♦  2p . *i  ♦  (q  -  i)p  * 

Definition:  If 


A  *  *1’  *2 . *pq  ‘ 

then  the  sequence  A'^  p  is  defined  as  the  sequenc  e  A^  p  rearranged  into 
ascending  order. 
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Definition:  If 

A  "  a , 1  1 

1  2  pq 

the  derived  sequence  A^’  ^  for  1  -  j  -  q  is  defined  as  a  sequence  of  p 

terms  whose  ith.term  is  the  jth  term  of  A'. 

x*  P 

Example:  Let  p  =  4,  q  =  3,  and  A  =  ?,  9.  13,  20,  17,  15,  10,  8,  4,  1,  3,  5. 
Wh.n  A  is  written  as  terms  of  a  3  -  by-4  matrix  (across  the  first  row,  then 
across  the  second  row,  etc.): 


7 

9 

13 

20  \ 

17 

15 

10 

8 

4 

1 

3 

5/ 

then  the  first  column  is  the  sequence  Aj  ^  ~  7,  17,  4,  the  second  column 
is  the  sequence  A-,  4  =  9,  15,  l,  etc.  If  each  column  of  this  matrix  is 

rearranged  into  ascending  order: 


j  4  1  3  51 

7  9  10  8  1, 

\  1 7  15  13  20, 


th?.m  A1,  .=  4,  7,  17  is  the  first  column 

*•  4  n  31 

column,  etc.,  and  A*  *  =  4.  1,  3,  5  is 

is  the  second  row,  etc. 


A',  .  *  l,  9,  15  is  the  second 

2-  4  (?  3) 

the  first  row,  A'  ’  ”  -  7,  9,  10.  8 


la  the  above  example.  A  is  bi-ton.c  since  one  circular  permutation  of  A  is 
20  17,  15,  10,  8,  4.  I,  3.  5,  7.  9,  13.  As  predicted  by  th-orem  1.  the 
subsequences  Aj  ^  3  7,  17.  4;  A,  ^  3  9.  1  * .  1;  ^  8  13,  10,  3,  and 

A^  ^  5  20.  8,  5.  also  are  bi-tonic.  In  the  example,  the  derived  sequences 

AU‘  3>  *  4.  1.  3.  5.  AU‘  3)  *  7.  9,  10.  8.  and  A(3,  3)  *  17.  15.  13.  20. 
are  bi -tonic  and,  furthermore.  A^*  ‘  3^  has  the  least  four  members  of  A. 
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A^’  ^  has  the  middle  four,  and  A^’  ^  has  the  greatest  four  members  of 
A,  Hence,  to  reorganize  A  into  ascending  order,  it  is  sufficient  to  re¬ 
arrange  each  of  the  three  bi-tonic-derived  sequences  into  ascending  order 
and  concatenate  them.  Theorem  2  shows  this  is  true  ior  any  bi -tonic  se¬ 
quence. 

Theorem  2:  If 


A  = 


al ’  a2’ 


a 

pq 


is  bi-tonic,  then  each  derived  sequence  A^'  wher  *  1  —  j  —  q,  is  bi¬ 
tonic  and 


max 

max 


[a“- 

s  min  [a<2,  q)] 

[a(2,  q) 

4  min  [a<3'  «>] 

[a(<*  ->■<!»]  S  min  j>  li] 


max 

Proof:  Consider  A  written  in  matrix  form: 


1 

a2 

a3 

...  a 

P 

P  +  1 

ap  +  2 

ap  +  3 

•  •  *  a2p 

2p  +  1 

a2p  +  2 

a2p  +  3 

•  ■  •  a3p 

a(q  -  l)p  +  1  a(q  -  l)p  +  2  a(q  -  l)p  +  3  ...  apq 
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l 


and  observe  that  a  circular  permutation  of  A  is  equivalent  to  a  circular 
permutation  of  the  columns  plus  a  circular  permutation  within  each  column. 
The  effect  of  a  circular  permutation  within  any  column  is  cancelled  when 
the  column  is  rearranged  into  ascending  order.  Hence,  a  circular  per¬ 
mutation  of  A  causes  a  circular  permutation  within  each  derived  sequence; 
this  does  not  affect  the  bi-tonic  property  and  maximums  and  minimums.  It 
is  concluded  that  it  is  sufficient  to  prove  the  theorem  for  any  circular  per¬ 
mutation  of  A. 


Pick  a  circular  permutation  of  A,  B  =  b.,  b_,  .  .  .  ,  b  for  which  there 

-  1  Z  pq 


is  an  integer  j,  with  1  -  j  =  pq,  so  that 


bl  -  b2  S 


.  -  b.  .  -  b.  -  b.  . 
J  -  1  J  J  +  1 


i  b 


pq  -  1 


*b 


pq 


Let  r  and  s  be  the  integers  defined  by 


.  ,  <  < 

rp  +  s  =  j  and  1  =  s  -  p  . 


B  in  matrix  form  is 


/b. 


+  2 


rp  +  2 


APPENDIX  V 


It  is  easy  to  see  that  lor  anv  k,  with  1  -  k  -  p,  max  b,  ,  b  ,  ,  b-  .  , 

y  L  k  p  +  k’  2p  +  k* 

.  .  .  .  b.  , -i  =  max  Fb,  .  b,  ..  .I  ,  so  that  after  each  column 

(q  -  Dp  l  k  lq  -  ijp  *  k] 

is  rearranged  into  ascending  order  each  term  of  row  q  the  derived  se- 
quence  B'v’’  4  comes  from  row  1  or  row  q.  The  proof  is  divided  into 
three  cases;  0<  r<  q-1.  r-0,  and  r  =  q  -  1. 

For  0  <  r  <  q  -  1 ,  the  inequalities 

b  =  b?  =  b,  =  .  .  .  £  b 
1  £  3  t> 


b  -  h 

(q  -  I }p  1  (q  -  l)p  +  2 


hold,  so  if  b,  .  -  h.  ..  for  seme  k.  where  I  =  k  ^  c.  then  b,  ,  - 

*  tq  -  Dp  r  K  .  )  *  k  +  1 

b  »  i  i-  This,  together  with  B'4'  4  -  max  |b,  ,  b.  ,,  ,1. 

cq  -  Dp  ■-  k  -  1  ^  5  L  1  (q  -  l)p  ♦  1J 

b  .  Z1 . max  (b  ,  b  1  implies  that  for  some  integer 

12  iq  -  !)p  -  21.  ,  p  oc  -  & 


L°2’  i q  -  l)p  -  2J.  ‘ 

t,  where  0  x  t  -  o,  B  ^  =  bj.  b^. 

b.  , .  .  .  ,  b  . 

{q  -  l)p  ♦  t  -  2 . pq 


bt  -  V  bf  S{q  -  1)P  +  t  4  1’ 


•£  ,  < 


t  5  P 


1,  B<(*"  4‘  is  bi -tonic  and  its  miiarr.un  is 

mir,  lb  .  b.  . ,  .  .  ,  1  • 

L  1  (q  -  Dp  *  t  -r  1  j 


Let  C  =  b 


t  -  1’  '  t  r  2’ 


mum  oi 


b  ,,  -  then  C  is  bi -tonic  with  a  maxi- 

\q  -  Dp  +  t 


max  Fb  , ,  b.  , :  1 

F  t  ’  1  (q  -  l)p  f  tj 


b  -  b 


t  +  1  ‘ 


b  -  b. 

t  (q  -  I ;p  r  t 
b(q  -  Dp  4  t  4  1  ?  bt  r  1  ’ 
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b  -  b 

(q  -  l)p  4  t  4  1  (q  -  l)p  +  t 

are  established  so  min  {b^*  q^j  -  max  (C).  If  t  -  0  or  t  =  p,  B^q’  q^ 
is  monotonic,  hence  bi-tonic,  C  is  bi-tonic,  and  min  q^J  -  max  (C). 

The  r  =  0  case,  can  be  reduced  to  the  case  r  =  q  -  1  by  inverting  the  order 
of  the  terms  of  B;  this  operation  inverts  each  derived  sequence  and  does 
not  affect  the  bi -tonic  property,  maximums  and  minimums. 


For  r  =  q 


1, 


b ,  £  b„  ^ 


.  .  *  b  ^  ^  b 


*  b 


-  b 


(q  -  l)p  +  1 
$  b. 


*  b 


(q  -  l)p  4  2  =  .  . 
$ 


(q  -  l)p  4  s  -  1  (q  -  l)p  4  s  (q  -  l)p  4  s  4  1 


^  b 


^  b 


pq  -  >  pq 

If  fc  -  b  ,  then  B^q’  q^  =  b, ,  b0,  .  .  .  ,t  ,  which  is  monotonic,  hence 
pq  p  i  c.  p 


pq  P 

bi-tonic,  with  a  minimum  of  b  .  C  =  b  ,  ,,  b 


m  •  *  »  j  b  is  oi 

pq 


F  P  4  1  P+^p,  pq 

tonic  with  a  maximum  of  max  (b  ,  b  ),  so  min  B.  M  I  i  max  (C). 

p  +  i  pq  *-  -*  ' 


p  +  i  pq 

If  b  >  b^,  then  there  is  an  integer  t,  s  -  t  -  p  -  I,  so  that  bt  - 

B(q?  q-  =  br  b, . bt, 

is  bi -tonic  with  a  minimum 


b,  .  \  .  .  and  b.  ,  .  -  b,  , ,  ,  ,  , 

(q  -  0  P  +  t  t  4  1  (q  -  l)p  +  t  4  1 

b(q  -  l)p  4  t  4  ]'  b(q  -  l)p  4  t  4  2’  ’  "  ’  bpq 

of 


mm 


IS-  b' 


(q  -  1)  p  +  t  +  1  J  * 

is  bi -tonic  with  a  maximum  of 


C  =  bt  4  !’  bt  4  2’  •  '  •  •  b(q  -  l)p  4  t 
max  [bt  +  j,  b^  1)1  +  t]  anc*  a6a’n  m*n  -  max  (C).  This  con¬ 

cludes  the  case  r  =  q  -  1, 

In  all  three  cases,  B^q’  q^  is  bi-tonic  and  if  C  -  B  -  B^q’  q^,  C  is  bi¬ 
tonic  and  min  [b^’  q^J  -  max  (C).  C  has  (q  -  l)p  terms.  It  is  not  hard 


to  see  that  the  derived  sequences  C 


(1,  q  -  1)  c(2,  q  -  1) 


C'  - 

»  •  •  •  |Vrf 


q  -  1 ,  q  -  1 ) 


of  C  will  be  circular  permutations  of  ’  q^,  q\  .  .  .  ,  B^q  ”  q\ 

respectively.  Hence,  the  above  proof  could  be  carried  out  on  C  to  show 
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that  C^q  q  "  ^  [and  therefore  B^q  *  ’  qM  is  bi -tonic  and  min 
[c(q  ‘  lt  q  "  ^imaxtc  -  C(q  "  1‘  q  ‘  1|].  Therefore,  min  [b(cj  "  *’  q)]  * 
{j3  -  B^q’  q^  -  B^q  “  q||.  Iteration  of  this  process  proves  theorem  2. 


max 


3.  BI-TONIC  SORTING  OPERATORS 

For  any  integer  n  >  1,  let  Nr  be  an  operator  that,  when  applied  to  any  bi- 
tonic  sequence  of  length  n,  causes  the  terms  of  it  to  be  rearranged  into 
ascending  order.  Theorems  1  and  2  show  that  for  any  integers  p  >  1  and 
q  >1,  N  can  be  constructed  from  p  applications  of  and  q  applications 
of  Np.  The  operator  equation  is 

N  (A)  =  N  (A.  )N  (A-  ).  .  .  N  (A  )N  [a(1,  q)]N  [A(2,q)]  .  .  .  N  [a^*  q)]  . 

pq  q  i.  p  q  2,  P'  qv  P,  p'  PL  J  PL  J  pl  j 

Or,  usingJJ notation, 


N  (A)  =  [  N  (A.  )  .  ]  |  N  Fa^'  q)]  . 

pqx  11  q  p  1  ]  pL  J 

j  =  i 


i  = 


A  special  case  is  when  p  =  q 

t-  1 


t  -  1 


N  t(A)  =  T 


l  = 


Na<A-  t  -  l) 


q 


j  =  i 


Tn  _  q>]. 

*  n 


Repeated  application  of  this  equation  allows  construction  of  N  t  from 
operators.  When  q  =  2,  q 

,t  -  1 


vA>  -n-vA.,, .  i»  •  n2«  .  iCa<1,  ^  •  n2.  .  i^2,  2>j  • 


i  =  1 


The  operator  is  a  comparison  of  two  numbers  with  an  exchange  if  they 
are  in  the  wrong  order  (N«  is  the  same  as  Q,  in  GER-11759}.a  The  N  . 

t  .  1  “  ^  2} 

operator  will  have  t2  N^  operators. 


a 


Ibid. 
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A  bi -tonic  sequence  can  be  formed  from  any  two  ascending  sequences  by 
inverting  one  sequence  and  concatenating  them.  The  bi -tonic  sequence 
then  can  be  sorted  by  means  of  an  N  operator;  the  result  is  the  merge  of 
the  ascending  sequences.  Hence,  the  operator  Nr  is  equivalent  to  Mm  , 
of  the  referenced  report  for  any  m,  k  where  m  +  k  =  n. 


In  general,  +  ^  uses  more  operators  than  This  is  the 

price  to  obtain  a  merge  operator  that  is  dependent  only  on  mi  k  and  not 

on  m  and  k  separately;  for  example,  in  the  referenced  report,  M  ^  j  ^ 

t  -  1  2.2 

uses  (t  -  1)2  +1  operators  but  N  .  has  the  advantage  that  it  can  be 

2  t 
used  to  merge  any  two  ascending  sequences  whose  combined  length  is  2' 

^  ^  can  only  be  used  when  both  sequences  have  the  same 


where  M 

21  ■  \  r 

length,  2*  ” 

If  a  sort  of  2n  numbers  is  conducted  using  N  operators  for  merging,  the 
2  n  2  , 

sort  will  use  (n  +  n)2n  ‘  u,  operators  whereas,  in  the  referenced  re- 
2  n  2  ™ 

port,  only  (n  -  n  +  4)2n  "  -  1  operators  are  required  using  M  oper¬ 

ators. 


4.  CONCLUSIONS 

This  report  indicates  how  a  merge  operator  N  .  can  be  constructed  from 

2T 

Q.  operators  that  will  merge  any  two  ascending  sequences  whose  combined 

“  t 

length  is  2  .  N  .  is  more  versatile  than  M  of  the  referenced  re- 

Zl  m,  2  -  m 

port  because  of  this  but  it  uses  slightly  more  operators. 


APPENDIX  VI  -  BASIC  ORGANIZATION  OF  MACHINE  I 


INTRODUCTION 

This  report  describes  a  computer  organization  consisting  of  several 
arithmetic  units  and.  several  I/O  channels  interconnected  by  a  multiaccess 
self-sorting  memory.  All  arithmetic  units  and  I/O  channels  can  access 
the  memory  at  the  same  time  with  no  conflict,  even  when  two  or  more 
units  access  the  same  word.  The  sorting  capability  of  the  memory  allows 
fast  sorting  and  searching  of  tables  and  a  form  of  content  addressing. 

This  capability,  together  with  the  parallel  arithmetic  capability,  gives  the 
processor  a  fast  processing  speed  on  most  classes  of  problems. 

THE  PROBLEM  OF  ACCESSING  DATA  IN  COMPUTER  ORGANIZATIONS 

Initially,  higher  processing  speeds  in  digital  computers  were  obtained 
mostly  by  using  faster  components.  Later,  higher  speeds  were  obtained 
mostly  by  doing  operations  simultaneously  that  previously  were  done  one 
at  a  time;  for  example,  the  first  computers  suspended  computations  during 
I/O  operations  while  later  machines  do  both  simultaneously.  The  latest 
large-scale  processors,  such  as  the  IBM  Stretch  and  the  CDC-6600,  allow 
several  I/O  and  arithmetic  operations  to  take  place  simultaneously. 

In  the  future,  computers  with  a  large  number  of  simultaneously  operating 
arithmetic  units  (ALU's)  and  I/O  channels,  perhaps  in  the  hundreds,  can 
be  expected.  The  major  problem  in  such  a  computer  is  that  of  giving  all 
units  fast  access  to  the  data  that  they  need. 

This  problem  exists  in  present-day  computers  where  memories  are  di¬ 
vided  into  several  banks  so  that  while  one  channel  is  accessing  one  memory 
bank  other  channels  can  be  accessing  other  oanks.  When  two  or  more 
channels  need  access  to  the  same  memory  bank,  one  of  the  channels  is 
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permitted  access  and  the  others  wait.  It  can  be  expected  that  the  fre¬ 
quency  of  these  conflicts  will  be  very  high  if  there  are  hundreds  of  chan¬ 
nels,  and  thus  this  method  does  not  appear  to  be  promising.  This  is 
especially  true  if  data  are  retrieved  by  content  rather  than  by  address, 
since  a  given  item  might  be  in  any  memory  bank  and  a  channel  must  look 
in  all  the  memory  banks  for  the  item.  In  this  case,  dividing  the  memory 
into  banks  does  little  to  increase  the  overall  processing  speed. 

When  the  class  of  problems  to  be  solved  by  a  particular  machine  is  re¬ 
stricted,  the  machine  can  be  tailored  to  prevent  memory  conflicts.  An 

1  <L 

example  of  this  is  the  SOLOMON  computer  ’  where  each  processing 
element  is  allowed  access  only  to  its  four  neighbors  (right,  left,  up,  down). 
On  certain  problems  with  a  rectangular  structure  (matrices,  partial- 
difference  equations,  etc.),  the  SOLOMON  computer  achieves  a  fast 
processing  speed  because  each  processing  element  only  needs  access  to 
its  four  neighbors  while  for  other  problems  the  time  spent  in  shuffling 
operands  to  the  elements  needing  them  will  slow  the  processing  speed 
drastically. 

The  above  discussion  exhibits  the  need  for  a  memory  capable  of  perform¬ 
ing  hundreds  of  accesses  simultaneously  without  conflicts.  Ideally,  no 
conflicts  should  arise  even  when  several  channels  want  the  same  word. 

This  situation  exists  if  several  ALU's  jump  simultaneously  to  a  common 
subroutine  such  as  a  square  root,  subroutine, 

3.  A  MULTIACCESS  SELF -SORTING  MEMORY  ORGANIZATION 
a.  Introduction 

One  way  to  build  a  memory  with  m  words  and  n  access  lines  is  to  use  a 
matrix  or  crossbar  switch  with  m  rows  and  n  columns.  The  amount  of 
equipment  in  such  an  arrangement  is  proportional  to  mn,  a  prohibitive 


Superior  numbers  in  the  text  refer  to  references  listed  under  Subhead  6  on 
page  162. 
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number  for  memories  of  reasonable  size.  Another  disadvantage  of  the 
matrix  is  the  fan-out  and  fan-in  required  of  some  of  its  elements;  for 
example,  there  is  a  gate  for  each  word  in  memory  loading  down  each 
access  line.  The  fan-out  and  fan-in  can  be  reduced  by  "treeing"  but  this 
increases  cie  amount  of  equipment  even  more. 

Fortunately,  other  networks  of  elements  exist  that  perform  the  same 
function  as  the  m-by-n  crossbar;  these  use  only  m/2  log^n  elements  (ap¬ 
proximately)  and  the  fan-in  and  fan-out  required  of  each  element  is  con¬ 
stant  regardless  of  m  and  n.  These  networks  are  based  on  the  sorting 

2  3 

and  merging  techniques  discussed  in  GER-11759  and  GER-11869  and 
described  below. 

b.  The  Comparison  Element 

Each  element  in  the  sorting  and  merging  networks  has  two  inputs,  A  and 
B,  and  two  outputs,  L  and  H,  as  shown  in  Figure  VI- 1.  When  two  items 
of  data  are  presented  on  the  inputs,  the  element  compares  the  two  items 
as  if  they  are  numbers  and  presents  the  lower  of  the  two  on  output  L,  and 
the  higher  on  output  H.  If  the  two  items  are  equal,  the  element  presents 
their  common  value  on  both  outputs.  Either  bit-serial  or  bit-parallel  or 
a  serial-p«rallel  form  of  data  transmission  is  possible;  however,  serial 
transmission  shculd  be  done  inost-significant  bit  first. 

Figure  VI-2  shows  a  13-NOR  comparison  element  through  which  data  are 
transmitted  one  bit  at  a  time,  most-significa.it  bit  first.  Basically,  this 
element  operates  as  follows.  The  B  >  A  and  A  >  B  flip-flops  are  reset 
by  the  reset  i,.put  and  then  the  data  items  are  presented  on  A  and  B  serially, 
most-significant  bit  first,  interspersed  with  clock  pulses  on  the  clock  input. 
With  the  flip-flops  reset,  the  L  output  is  the  logical  product  (AND)  of  A  and 
B,  while  H  is  the  logical  union  (OR).  If  A  =  B,  the  clock  pulse  has  no 
effect.  If  A  =  1  and  B  =  0,  the  A  >  B  flip-flop  is  set.  If  A  -  0  and 
B  -  1,  the  B  >  A  flip-flop  is  set.  If  the  B  >  A  flip-flop  is  set,  it  re¬ 
mains  so  until  the  next  reset  pulse.  It  changes  the  operation  of  the  circuit 
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so  that  L  =  A  and  H  =  B  and  it  inhibits  the  setting  of  the  other  flip-flop. 
The  operation  is  similar  it  the  A  >  B  flip-flop  is  set. 

Comparison  elements  that  compare  more  than  one  bit  of  each  item  at  a 
time  also  are  possible.  Also,  it  is  possible  to  add  a  shift-register  stage 
to  each  input  or  to  each  output  so  that  the  element  has  a  temporary  stor¬ 
age  function  as  well  as  a  comparison  function. 

In  the  networks  to  be  described,  the  L  and  H  output  of  each  element  will 
be  connected  to  an  A  or  B  input  of  another  element,  hence  the  load  on 
the  L  and  H  outputs  is  fixed.  This  should  make  it  possible  to  construct 
these  elements  economically;  for  instance,  the  logic  of  Figure  Vl-2  could 
be  put  on  one  integrated  circuit  chip  so  that  the  elements  could  be  fabri¬ 
cated  in  batches. 

c.  M  Merging  Networks 

—  —  m,  n - - — * - 

The  comparison  elements  can  be  combined  to  form  a  network  that  can 
merge  an  ordered  set  ot  rn  items  with  an  ordered  set  of  n  items  to  form 
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an  ordered  set  of  m  +  n  items  (Figure  VI-3);  that  is,  if  m  items  arrive 

over  input  lines  a,,  a,,  .  .  .  ,  a  and  n  items  arrive  over  b, ,  b_,  .  .  .  , 
i  &  m  <<l4< 

simultaneously  and  if  a^  =  a^  -  •  ;  •  -  am  and  bj  =  b£  -  •  •  >  =  b^, 
then  the  m  +  n  items  will  be  sent  out  on  Cj,  c^,  ....  c  +  n  reordered 


so  that  Cj  =  - 

work. 


ic  uui  uii  ^2’  ■  '  '  ’  'm  +  n 

c  ,  .  This  is  called  an  M  merging  net- 

m  +  n  m,  n 


The  construction  of  an  ^  merging  network  is  based  on  the  merging 

technique  described  in  Reference  2.  Basically,  the  network  merges  the 


setaj,  a^,  a,.,  .  .  .  with  the  re:  bj,  b^,  b^,  .  .  .  in  one  subnetwork  while 
merging  the  set  a^,  a^,  .  .  .  with  the  set  b^,  b^,  b^,  ...  in  another 
subnetwork.  The  outputs  of  the  two  subnetworks  are  combined  to  form  the 
output  Cj,  C£,  c^,  .  .  .  .  The  subnetworks  in  turn  each  consist  of  two 
subnetworks  combined  the  same  way,  etc. 


The  construction  is  made  more  explicit  in  Figure  VI- 4,  which  shows  how 
two  subnetworks  are  combined  to  form  the  larger  merging  network,  M 
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Fieure  VI-4  -  Construction  of  M  from  Two  Subnetworks  and  a  Set 
°  m ,  n 

of  Comparison  Elements  ^ 

-  1  4  5-  f 
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There  are  four  cases  depending  on  the  odd-even  character  cf  the  numbers 
m  and  n.  The  case  where  m  is  even  and  n  is  odd  is  obtained  from  the 
case  where  m  is  odd  and  n  is  even  simply  by  interchanging  the  two  input 
sets;  hence,  only  three  cases  are  illustrated.  In  all  three  cases,  one 
subnetwork  receives  all  the  even-indexed  items  of  the  input  sets  while 
the  other  subnetwork  receives  the  odd-indexed  items.  The  respective 
outputs  of  the  two  subnetworks  are  compared  b\  a  set  of  comparison  ele¬ 
ments  (one  or  two  of  the  subnetwork  outputs  bypass  this  stage  as  indi¬ 
cated  in  Figure  VI-4)  and  the  outputs  of  these  elements  are  the  outputs  of 


M 

m.  n 

By  applying  the  same  procedure  to  the  subnetworks  and  then  to  the  sub¬ 


subnetworks.  etc.  .  the  construction  *s  reduced  to  a  set  of  M 

p.  i  I.  p 

network  can  be  built  by  means  of  the  fa- 


M 

r  1 , 


merging  networks.  An  M 


1 


miliar  binary  search  technique;  that  is,  the  item  b.  that  is  to  be  merged 
with  a,,  a-,  ....  a  is  comoared  first  with  a  (or  thereabouts)  and 

It.  p  p/  c 

the  lower  of  the  two  is  merged  with  a}.  a^ . a—,  .while  the  higher  is 

merged  with  a  . a  ..  a  .  Figure  VI-5  shows  , 

°  p/2  +1  p  -  1  p  6  10.1 

constructed  this  way  as  an  example. 


Another  example,  Mj^  is  shown  in  Figure  VI-6  with  the  subnetworks 
and  sub-subnetworks  identified  by  the  dotted  boxes.  A  proof  that  the  above 
merging  networks  dc  in  fact  "merge"  is  given  in  Reference  2.  (An 
network  corresponds  to  the  operator  of  this  reference.  ) 

Let  h(m,  n)  De  the  number  of  comparison  elements  in  M  .  An  exact 

m  f  n 

expression  for  h(m,  n)  is  hard  to  obtain  but  an  idea  of  how  fast  it  grows 
is  indicated  by  h(2P,  2^  -  2°)  =  (p  +  2)2^  *  -  2P  +  *  +  i  (for  q  p  -  0). 

Other  special  cases  of  h(in,  n)  are  given  in  P.eference  2. 


As  can  be  seen  from  Figure  VI-4,  doubling  the  size  of  ^  adds  one 

level  of  comparison  elements  to  the  network;  hence,  the  longest  path 

through  the  network  is  proportional  to  the  logarithm  of  the  size  of  the 

network;  for  example,  the  longest  path  in  M  goes  through  q 

2P  2q  -  2P 

comparison  elements. 
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Figure  VI-5  -  {  Merging  Network 

When  the  comparison  elements  include  flip-flop*;  for  storage  of  data, 
there  is  a  one  clock-’ntervai  delay  in  each  element  and  therefore  it  is 
necessary  to  add  extra  delays  in  the  shorter  paths  of  the  network  to 
equalize  the  delay  in  all  p-fhs.  Delay  elements  or  extra  'waste"  com¬ 
parison  elements  can  be  used  for  this  purpose.  As  an  example,  an 
M  network  then  would  have  q  levels,  with  each  level  having 

2P,  2^  -  2? 

2^  *  comparison  elements. 


d.  Bi-Tonic  Merging  Networks  (N  n) 

_  ? 

One  disadvantage  of  ^  is  that  it  can  merge  only  m  items  with  n 

items  and  thus,  for  instance,  cannot  fulfill  a  need  to  merge  m  +  1  items 
with  n  -  1  items.  There  is  another  class  of  merging  networks  that  has 
the  capability  of  merging  lists  of  items,  regardless  of  the  number  of 
items  in  each  list,  subject  only  to  the  constraint  that  the  total  number 
of  items  to  be  merged  is  a  power  of  two  and  remains  constant. 


-147- 


APPENDIX  VI 


A  sequence  of  numbers,  a^.  . a  ^  is  bi"tonic  if  it  18  monotonic 

or  if  it  consists  of  two  monotonic  sequences,  one  ascending  the  other 
descending,  placed  side  by  side  (it  does  not  matter  whether  the  ascend 
ing  sequence  precedes  the  descending  sequence  or  follows  it).  An  N  ^ 

z 

bi-tonic  merging  network  such  as  shown  in  Figure  VI-7  can  rearrange 

any  bi-tonic  sequence  a.,  a9,  ....  a  into  an  ascending  sequence. 

1  c 


For  any  q  =  1,  N  can  be  constructed  as  follows.  If  q  ~  1,  N?  is  simply 
2q 

one  comparison  element;  if  q  >  1,  N  consists  of  two  N  .  networks 

2q  2q  -  1 

and  2q  comparison  elements  connected  as  shown  in  Figure  Vl-8.  A 
proof  that  these  networks  function  as  stated  is  given  in  Reference  3. 

XT  ...III  L  .  on  w  1  A  M  A  VI  AA—k  Inonl  1  ■  >  V  IT  k.lf.  ^  A  A  WV  A  A.i  BAA  bIb 


N  will  have  q  levels  and  each  level  will  have  2 

n  * 


comparison  ele¬ 


ments.  All  paths  traverse  q  elements  so  there  is  no  need  to  add  extra 

"waste"  elements  to  equalize  path  lengths.  To  use  N  as  a  merging  net- 

2q 

work,  one  cf  the  input  sets  should  enter  N  ^  in  ascending  order  and  the 

other  in  descending  order  so  that  the  total  input  set  a^,  a^,  .  ...  a  is 
a  bi-tonic  sequence. 


e.  Sorting  Networks 

The  merging  networks  of  Items  3,  c  and  3,  d  above  can  be  used  to  con¬ 
struct  sorting  networks  by  means  of  the  well-known  sorting -by-rrerging 
technique;  for  example,  a  network  to  sort  2q  items  consists  of  2q  1 
comparison  elements  to  arrange  the  items  into  2q  ordered  sequences 
of  length  q  followed  by  2q  ~  ^  or  ^4  networks  to  merge  these  se¬ 

quences  into  sequences  of  length  4,  etc. 

The  total  number  of  levels  in  a  sorting  network  for  2q  items. is  l/2  q(q  +  1). 
If  M-merging  networks  are  used,  the  total  number  of  elements  is  (q^  -  q  +  4) 
2q  -  1,  while  if  N-merging  networks  are  used  the  number  of  elements 
i s  q(q  +  1  )2q  ^ . 
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t.  Separating  Networks  (N 

A  merging  network  combines  two  ordered  sets  of  items  into  one  ordered 
set.  It  is  also  desirable  to  have  a  network  for  the  inverse  operation;  that 
is,  a  network  that  can  separate  an  ordered  set  of  items  into  two  ordered 
sets  teach  item  is  marked  with  a  flag  bit  to  indicate  the  ordered  set  to 
which  it  belongs). 


N  (for  q  =  1)  is  defined  as  a  bi -tonic  separating  network.  That  is,  if 
2q 

2q  items  dj,  d^,  .  .  .  ,  d  ^  are  presented  over  its  inputs  with  -  d^  - 


.  .  .  -  d  and  if  k  of  these  items  are  flagged  (0  -  k  -  2q),  then  N 

2q  2q 

presents  the  k  flagged  items  on  outputs  e^,  .  .  .  ,  e^  ordered  so  that 

-  e^  -  .  .  .  =  e^.  and  presents  the  2q  -  k  unflagged  items  on  +  j, 

1 


'k  +  2’ 


e  ordered  so  that  e,  ,  ,  -  e.  , 

7q  k  +  1  k  +  2 


< 

=  e 


»q 


N  ^  can  be  constructed  by  an  iterative  process  that  is  analogous  to  the 

process  used  for  constructing  N  Observe  from  Figure  VI-8  that  each 

element  in  the  last  level  receives  an  item  from  the  set  a,,  a-,  ac,  ,  .  . 

1  j  o 


a  ,  on  its  A  input  and  an  item  from  the  set  a.,  a,, 
2»q  -  1  2  4  o 


•  •  »  &  on 
2q 


its  B  input.  This  suggests  that  each  element  in  the  first  level  of  N 

2q 

should  decide  which  of  its  two  inputs  n»eds  an  odd  index  in  the  final  output, 


el’  e2‘ 


e  ,  and  which  needs  an  even  index. 
2q 


<  <  Q 

From  the  definition  of  N  ,  the  following  rules  (where  1  =  i  =  2  )  can  be 

2 


established: 

1.  If  i  is  even  and  if  the  set 


k. 


i  ♦  > . d24 


contains  an  even  number  of  flagged  items,  then  the 
item  d.  belongs  in  the  set  E  =  |e£ ,  e^,  e^, 


.  »  •  •  *  » 


2q 


} 
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2.  If  i  is  even  and  if  the  set 


■'  {dr  dit, 


.  •  ,  d  l 

2qJ 

contains  an  odd  number  of  flagged  items-,  then  the 
item  d^  belongs  in  the  set  0  =  je^,  ev  e^,  .... 

3.  If  i  ia  odd,  then  d.  belongs  in  the  complement  to  the 
set  containing  d.  +  that  is,  d.£E  if  d.  +  0  and 


d.£ 0  if  d.  .  CE 
i  l+l 


To  establish  these  rules,  any  integer  i  is  considered  as  1  =  i  =  2q.  Let 

t  be  the  number  of  flagged  items  in  d.,  d.  . . d  .  From  the 

1  i  4  1  -,q 

U 

definition  cf  N  ,  d.-— if  d.  is  flagged  and  d.— *-e.  _  if  d.  is  unflagged, 

2q  it:  l  l  +  t  i 

Therefore,  if  i  is  even.  a.  belongs  to  E  or  0  depending  on  whether  t  is 
even  or  odd,  respectively.  This  establishes  rules  1  and  2.  The  case  of 
i  odd  is  divided  into  four  subcases,  all  satisfied  by  rule  3: 

1.  If  d.  is  flagged  and  d.  +  ^  flagged,  then  d^— and 

dx  +  l  —  et  -  1 


2.  If  d.  is  unflagged  and  d.  ^  ^  unflagged,  then  d.— ►e. 


and  d 


i  + 


-e . 


i  +  t  +  1 


f  t 


3.  If  d.  is  flagged  and  d.  +  ^  unflagged,  then  d.-— *-et 


and  d. 


i  +  1 


■e . 


i  +  t 


4. 


If  d.  is  unflagged  and  d.  ,  flagged,  then  d.~— — e. 

i  l  +  1  i  l  +  t 

and  d. 


i  -  1  t 

An  N  network  (Figure  VI -9)  can  be  built  from  two  N 


1 


networks  pre- 


2q  ^  2q 
ceded  by  a  set  of  separating  elements.  An  network  is  simpiy  one  sepa 

rating  element.  Each  of  the  separating  elements  in  the  first  level  of  IN' 
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receives  two  items*  of  data  (d.  and  d.  over  L  and  H  inputs  (the  first 
bit  received  in  each  item  is  itr  flag  bit).  It  also  receives  an  indication 
(L)  of  whether  the  number  of  flagged  items  in  the  set 

d  l  is  odd  or  even.  From  rules  1,  2,  and  3  it  decides  whether  d.  £0  and 

2qJ  1 

d.  ,  .  fE  or  d.f  E  and  d.  ,  £  0  and  presents  d.  and  d.  ,  on  its  0  and  E  out- 

l+l  i  i+I  r  i  i+I 

puts  accordingly. 

The  f.  signals  are  generated  in  a  network  of  exclusive -or  circuits  that  re¬ 
ceive  the  flag  bits.  Long  ripple  times  can  be  avoided  by  using  a  "look- 
ahead"  structure  similar  to  the  carry-look-ahead  structures  of  MacSorley. 

IL*  A  Multiaccess  Memory  Using  Merging  and  Separating 

The  networks  described  in  Items  3,  c,  d,  e,  and  f  above  can  be  combined 
to  form  a  multiaccess  memory  (Figure  VI- 10).  The  elements  in  the  N 

and  N  networks  incorporate  shift-register  stages  that  store  the  bits  of 
2q 

data  in  memory.  The  memory  words  recirculate  the  most-significant  bits 

/'W 

first  through  the  N  and  N  networks  via  the  paths  a,  b,  and  c  in  Fig- 

2q  2q 

ure  VI-iO.  Each  memory  word  has  an  address  field  .n  its  most-significant 
portion  followed  by  a  data  field  followed  by  a  control  field  of  p  +  2  ONEs. 
The  words  in  memory  are  kept  in  ascending  order;  that  is,  the  word  with 
the  highest  address  field  is  at  the  top  of  memory,  etc;.  Words  with  equal 
address  fields  are  in  adjacent  locations  ordered  by  their  oata  fields  (in 
order  that  words  be  in  algebraic  order,  a  ONEs  or  TWOs  complement 
system  is  used  with  the  sign-bit  complemented).  Empty  words  have  zeroes 
in  all  digits  so  that  they  fill  the  bottom  portion  of  memory.  The  order  is 
kept  in  memory  by  sorting  ail  new-  words  each  cycle  and  merging  them  with 
the  memory  words  in  N 

2q 

There  are  q  levels  in  N  ,  q  levels  in  N  ,  and  one  level  in  the  transfer 

2q  2q 

network  so  the  recirculation  paths  have  a  delay  of  2^  +  ^  clock  pulses. 

The  word  length  is  a  multiple  of  2q  +  1  and  transmission  is  done  in  serial- 
parallel  form.  Words  will  recirculate  twice  per  basic  memory  cycle  and 
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thus  the  memory  cycle  time  is  4q  +  2  clock  pulses.  In  operation,  requests 
arrive  over  the  2^  input  channels  (j  in  Figure  VI-10).  These  have  the  same 
format  as  the  memory  words.  There  are  three  types  of  requests:  WRITE, 
READ,  and  READ  AND  ERASE,  A  WRITE  request  has  ones  in  its  control 
field  and  has  the  address  and  data  of  the  word  to  bo  written  in  its  address 
and  data  fields.  A  write  request  does  not  overwrite  old  data  but  rather 
creates  a  new  word.  A  READ  request  has  the  bits  Olxx  .  .  .  xx  in  its 
control  field  where  xx  .  .  .  xx  is  the  channel  number,  A  READ  AND 
ERASE  request  has  the  hits  lOxx  .  .  .  xx  in  its  control  field  where  xx  .  .  . 
xx  is  the  channel  number.  The  address  and  data  fields  of  a  READ  or  READ 
AND  ERASE  request  indicate  the  word  to  be  read.  If  there  is  a  word  : 
memory  whose  address  and  data  fields  agree  with  that  of  the  request,  that 
word  is  read;  otherwise,  the  memory  word  that  is  immediately  higher  than 
the  request  is  read.  As  examples,  if  the  data  field  of  a  request  is  all 
zeros  and  if  there  is  one  memory  word  whose  address  agrees  with  the  re¬ 
quest,  that  memory  word  is  read;  if  several  words  have  addresses  agree¬ 
ing  with  the  request,  the  one  with  the  least  data  field  ip  read,  etc.  A  READ 
AND  ERASE  request  will  erase  a  memory  word  after  reading  it.  Any  in¬ 
active  channel  will  have  a  request  word  of  all  zeros. 


A  sorting  network  orders  the  requests  with  the  address  field  taking  prece¬ 
dence  over  the  data  field,  etc.  The  ordered  requests  are  presented  to  N 

2q 

over  the  e  lines  (Figure  VI-10)  with  the  highest  request  entering  the  bottom 
of  memory,  the  next  lower  request  entering  the  next  higher  word,  etc. 
Concurrently,  the  memory  words  ertter  N  ^  over  c  in  ascenuirg  order. 

The  2^  empty  words  enter  on  the  d  lines.  The  input  to  N  ^  is  a  bi -tonic 

sequence  (Item  3,  d  above)  and  therefore  q  clock  pulses  later  the  merged 
requests  and  memory  words  start  leaving  N  in  ascending  order. 


Merging  continues  until  the  control  field  starts  to  enter  the  transfer  net- 

•*w 

work.  Meanwhile,  the  address  and  data  fields  circulate  through  N  and 

2q 
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back  into  N  via  the  k  and  c  lines.  The  N  network  is  set  so  that  no 
2q  2q 

reordering  of  the  words  occurs.  Also,  the  comparison  elements  of  N 


are  reset  as  the  words  re-enter  N  so  that  the  ascending  order  is  pre¬ 
served.  ^ 


At  this  noi.nt  in  the  memory  cycle,  the  first  two  bits  of  each  control  field 
are  residing  in  the  transfer  aetwork.  These  bits  are  00  for  empty  words, 
01  for  READ  requests,  10  fo>  READ  AND  ERASE  requests,  and  11  for 
memory  words.  The  transfer  rotwork  remembers  this  information  and 
changes  these  two  bits  to  the  code: 


01  for  unerased  memory  words 

10  for  erased  memory  words  and  empty  words 

11  for  READ  and  READ  AND  ERASE  requests 


The  code  change  is  mechanized  easily.  An  erased  memory  word  can  be 
distinguished  from  an  unerased  memory  word  since  it  is  located  immediate¬ 
ly  above  a  READ  AND  ERASE  request.  The  first  bit  of  the  new  code  will 

be  a  flag  for  N  to  indicate  the  words  to  be  removed  from  memory. 

2q 


During  the  next  few  clock  pulses  the  remainders  of  the  control  fields  are 
fed  through  the  transfer  network  with  no  change.  As  the  address  and  data 
fields  are  fed  through  ihe  transfer  network,  the  address  and  data  fields  of 
each  READ  and  READ  AND  ERASE  request  are  overwritten  with  the  address 
and  data  fields  of  the  first  memory  word  above  the  request. 


As  the  words  proceed  through  N  .  the  flagged  words  are  separated  from 

2q 

the  unflagged  and  exit  at  the  bottom  of  N  .  The  2q  -  2^  topmost  words 

2q 

enter  N  again  via  the  c  lines  and  a  new  memory  cycle  starts. 

2q 

The  2^  4  *  bottommost  words  contain  all  requests,  all  erased  words  and 

«*■>  «■»» 

some  empty  words.  They  are  put  into  N  ,  via  the  f  liries.  N  . 

2p  +  t  2P  +  1 


I 
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uses  the  second  control  bit  as  a  flag  to  separate  requests  from  erased 
words. 


The  requests  enter  a  sorting  network  (now  with  the  control  field  in  the 

most-significant  place),  which  resorts  the  requests  by  channel  numbers. 

The  sorted  requests  enter  an  N  .  network  (via  i)  along  with  the  channel 

2P  +  l 

numbers  (via,?-).  In  this  network,  the  comparison  elements  contain  re¬ 
verse  paths  along  with  forward  paths.  The  comparisons  are  performed 
on  numbers  in  the  forward  paths  but  the  switching  action  of  each  element 
affects  botn  the  forward  paths  and  the  reverse  paths.  By  this  means,  the 
request  can  be  fed  back  on  the  correct  channel  viaX. 


Assuming  logic  elements  with  lQ-nsec  propagation  delays,  a  30,000-word 
memory  should  have  a  memory  cycle  time  of  8  psec  and  an  access  time  of 
16  psec  (the  access  time  is  longer  than  the  cycle  time  because  of  the  time 
spent  in  sorting  and  re-sorting  the  requests  and  because  it  includes  the 
time  to  transmit  the  full  requested  word).  These  times  assume  1024  re¬ 
quest  channels.  Because  1024  requests  can  be  processed  every  8  psec, 
an  effective  cycle  time  of  8  nsec  is  obtained. 


4.  PARALLEL  COMPUTER  ORGANIZATION 

The  multiaccess  self -sorting  memory  of  Item  3,  g  above  can  be  used  in 
a  parallel  computer  organization.  An  example  parallel  processor  might 
have  300  arithmetic  units,  each  with  its  own  accumulator,  quotient  register, 
index  registers,  instruction  register,  program  counter,  and  request  chan¬ 
nel  to  the  memory.  Because  these  arithmetic  units  can  be  mass  produced, 
it  is  expected  that  each  would  be  considerably  cheaper  than  an  arithmetic 
unit  of  a  normal  computer. 

The  instruction  set  of  each  arithmetic  unit  is  similar  to  that  of  a  normal 
computer:  LOAD  A,  LOAD  Q,  STORE  A,  STORE  Q,  ADD,  SUB,  MULTI¬ 
PLY,  DIVIDE,  SHIFT,  JUMP,  etc.  There  are  a  few  differences,  however: 
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1.  Each  instruction  that  reads  an  operand  from  memory 
has  two  modifying  bits.  One  bit  indicates  whether 
the  operand  should  be  erased  from  memory  or  not; 
that  is,  whether  the  read  request  sent  to  memory 
should  be  READ  or  READ  AND  ERASE.  The  other 
bit  indicates  whether  the  address  of  the  operand 
accepted  from  memory  should  agree  with  the  re¬ 
quested  address  or  not;  that  is,  since  the  memory 
always  returns  an  operand  for  each  read  request, 
the  operand  address  will  be  different  from  the  re¬ 
quested  address  if  no  word  in  memory  has  the  re¬ 
quested  address.  If  this  occurs,  the  second  bit  in¬ 
dicates  whether  to  reinitiate  the  request  or  whether 
to  use  the  operand  received  fTom  memory.  Some 
cases  require  both  modes. 

2.  Each  store  instruction  creates  a  new  word  in  memo¬ 
ry  instead  of  overwriting  an  old  word.  This  makes 
it  possible  to  store  several  words  in  memory  with 
the  same  address  (a  set  of  words  can  be  ordered 
simply  by  giving  each  item  the  same  address). 

3.  Most  instructions  that  read  an  operand  from  memo¬ 
ry  will  fetch  the  minimum  item  if  there  is  more  than 
one  item  with  the  same  address.  This  is  because 
the  data  field  of  the  READ  request  to  memory  con¬ 
tains  zero.  It  is  also  useful  to  be  able  to  fetch  a 
word  in  the  middle  of  a  list  of  items  stored  with  the 
same  address.  Thus,  there  should  be  some  fetch 
operations  that  use  the  contents  of  the  accumulator 
for  the  data  field  of  the  read  request.  These  oper¬ 
ations  are  similar  to  threshold  searches  in  a  nor¬ 
mal  computer,  such  as  the  CDC-1604,  except  that 
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they  require  only  one  memory  access  time  for  execu¬ 
tion  regardless  of  the  length  of  the  table  being  searched. 

4.  Indirect  addressing  capability  will  be  useful.  Some 
programs  can  be  executed  faster  if  one  arithmetic 
unit  computes  "addresses'’  while  another  refers  to 
these  by  indirect  addressing. 

5.  Indexable  jumps  are  useful  since  there  will  be  cases 
where  several  arithmetic  units  may  be  executing  the 
same  subroutine  and  the  return  addresses  have  to  be 
stored  in  the  arithmetic  units  themselves. 

6.  Interrupting  capability  is  useful  so  one  "master" 
arithmetic  unit  can  control  the  others  easily.  It  also 
allows  an  arithmetic  unit  that  is  waiting  for  data  to 
be  interrupted  and  started  on  a  new  program. 

7.  A  "skip"  on  the  presence  or  nonpresence  of  an  ad¬ 
dress  in  memory  is  useful  for  synchronizing  arithme¬ 
tic  units. 

The  example  processor  might  have  hundreds  of  I/O  channels,  each  with  its 
own  request  channel  to  memory.  There  may  also  be  a  large  backup  store 
to  main  memory  that  uses  several  request  channels  so  that  large  blocks  of 
data  can  be  moved  in  and  out  in  parallel.  With  a  large,  fast-access  back¬ 
up  store,  a  large  main  memory  is  not  needed.  The  I/O  equipment  can  be 
controlled  by  reserving  certain  addresses  for  I/O  control-word  storage. 

This  example  parallel  processor  will  be  able  to  perform  a  large  class  of 
programs  very  fast.  The  sorting  and  searching  capability  allows  fast  data 
retrieval  while  the  parallel  arithmetic  units  allow  programs  to  be  executed 
\  in  parallel.  For  example,  in  the  processing  of  a  list  structure,  the  struc¬ 

ture  can  be  gone  through  in  parallel,  with  a  different  arithmetic  unit  proc¬ 
essing  each  branch  and  subbranch,  etc.  The  parallel  I/O  channels  allow 
fast  data  input  and  output,  multiconsole  arrangements,  fast-access  backup 


-160- 


APPENDIX  VI 


stores,  etc.  With  error-detecting  capability  in  the  arithmetic  units  and 
interrupt,  it  is  possible  to  bypass  failed  arithmetic  units  without  halting 
computation,  greatly  reducing  machine  down  time. 

5.  CONCLUSIONS 

This  report  shows  how  a  fast  multiaccess  self-sorting  memory  can  be 
constructed.  It  also  gives  an  example  of  a  parallel  processor  organiza¬ 
tion  (Machine  I)  using  this  memory.  Besides  the  parallel  arithmetic  ca¬ 
pability,  this  organization  has  the  following  features: 

1.  The  sorting  capability  in  memory  allows  fast  sorting 
and  table  searching. 

2.  The  parallel  I/O  channels  allow  fast  data  input  and 
output. 

3.  The  organization  utilizes  content-addressing. 

4.  It  is  possible  to  bypass  failed  arithmetic  units  with¬ 
out  halting  computation.  This  will  result  in  greatly 
reduced  downtime. 

5.  A  full  complement  of  arithmetic  units  are  not  re¬ 
quired  for  operation  of  the  machine.  Additional  arith¬ 
metic  units  can  be  added  later  without  changing  the 
programs.  As  the  number  of  arithmetic  units  in¬ 
crease  the  machine  time  (assuming  sufficient  paral¬ 
lelism  in  the  program)  will  decrease. 

6.  The  programmer  does  not  need  to  assign  tasks  to 
each  arithmetic  unit.  A  list  of  tasks  to  be  performed 
is  stored  in  memory,  each  with  the  same  address. 

The  arithmetic  unit(e)  take  the  top  item(s)  from  this 
list. 

The  organization  of. the  example  parallel  processor  may  be  -modified  if 
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programming  studies  indicate  the  need  for  other  features.  The  merging 
networks  in  this  paper  may  have  an  application  in  any  communication 
switching  problem.  They  can  be  made  to  resemble  large  crossbar  switches 
but  they  have  fewer  elements. 


6.  REFERENCES 

1.  Slotnick,  D.  L.  ,  et  al:  Solomon.  Proceedings  of  the  Fall  Joint  Com¬ 
puter  Conference,  1962. 

2.  GER- 11759:  A  New  Internal  Sorting  Method.  Akron,  Ohio,  Goodyear 
Aerospace  Corporation,  29  September  19^3"! 

3.  GER- 11869:  Bi- Tonic  Merging.  Akrcn,  Ohio,  Goodyear  Aerospace 
Corporation,  December  1964. 

4.  MacSoriey,  O.  E.  :  " High-Speed  Arithmetic  in  Binary  Computers.  " 
Proceedings  of  the  IRE,  January  1961.  vol  49,  no.  1. 


-162- 


APPENDIX  VII  -  PARALLEL  MERGING -SEPARATING  MEMORIES 


1.  INTRODUCTION 

In  a  multiprocessor  using  a  sorting  memory  as  a  multiaccess  memory, 
the  full  sorting  capability  of  the  memory  is  not  needed  since  most  of  the 
memory  words  remain  in  the  same  order  from  one  cycle  to  the  next. 

Only  new  additions,  read  requests,  and  erasures  cause  changes  and 
these  are  a  small  fraction  of  all  the  wards  in  memory.  This  leads  to 
the  concept  of  using  a  merging -separating  memory  in  place  of  a  com¬ 
plete  sorting  memory.  The  cycle  time  of  a  sorting  memory  of  2n  words 
is  l/2n  (n  +  1)  steps  while  that  for  a  merging -separating  memory  of  2n 
words  is  2n  +  2  steps.  Thus,  a  faster  cycle  time  should  be  expected  in 
a  merging-separating  memory  (some  of  its  steps  will  be  longer  but  there 
still  will  be  a  time  advantage). 

2.  FUNCTIONAL  DESCRIPTION  OF  A  MERGING-SEPARATING  MEMORY 

A  merging-separating  memory  cycle  has  four  phases:  merging,  flagging, 
separating,  and  exchanging  (see  Figure  VII- 1).  At  the  beginning  of  a 
merging  phase,  the  set  of  memory  words  is  divided  into  two  parts,  the 
higher  containing  the  words  of  memory  left  from  previous  cycles  arranged 
in  numerical  order,  and  the  lower  containing  new  additions  and  read  re¬ 
quests  arranged  in  order.  In  the  merging  phase,  these  two  parts  are 
merged  so  the  old  memory  words,  new  additions,  and  read  requests  form 
one  ordered  list  in  memory.  In  the  flagging  phase,  the  contents  of  the 
requested  memory  words  are  transferred  to  the  read  requests,  the  read 
requests  are  flagged,  and  the  memory  words  to  be  erased  (which  are  as¬ 
sociated  with  read  and  erase  requests)  are  flagged. 

In  the  separating  phase,  the  flagged  words  are  separated  from  the  un¬ 
flagged,  the  unflagged  memory  words  left  in  the  higher  part  cf  memory 
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are  arranged  in  order,  and  the  flagged  read  requests  and  erasures  left  in 
the  lower  part  are  arranged  in  order.  In  the  exchanging  phase,  equipment 
external  to  the  memory  (not  shown  in  figure  VII- 1)  reads  the  read  requests 
and  erasures  and  replaces  the  lower  part  of  memory  with  a  new  ordered 
lint  of  new  additions  and  read  requests  for  the  next  cycle. 

The  merging  and  separating  phases  are  most  easily  realised  if  the  lower 
lists  are  arranged  in  an  order  opposite  that  of  the  upper  lists.  A  bi -tonic 
merge  performs  the  merging  and  the  separating  is  done  by  an  "inverse" 
network  (see  Appendix  VI). 

Let  the  2?  words  of  the  memory  be  indexed  with  0,  1,  2,  .  .  .  ,  2n  ••  1 
with  0  the  inde3£~of  the  word  at  the  low  end  and  2n  -  1  the  index  of  the  word 
at  the  high  end.  In  each  step  of  the  merging  phase,  the  2n  words  are 

n  i 

formed  into  2  "  pairs.  The  two  words  in  each  pair  are  compared  and 

if  the  word  with  the  lower  index  is  higher  in  magnitude  than  the  word  with 
higher  index,  the  two  words  are  exchanged;  otherwise,  the  pair  is  left 
alone . 

The  pairing  rule  is  explained  easily  if  the  indices  are  considered  as  written 
in  binary  form,  the  bits  of  the  indices  are  indexed  by  1,  2,  3,  .  .  .  ,  n  (1, 
the  most-significant  bit  and  n,  the  least-significant  bit),  and  the  steps  of 
merging  phase  are  indexed  1,  2,  .  .  . ,  n  in  sequential  order.  The  pair¬ 
ing  rule  is:  "On  step  k,  word  i  is  paired  with  word  j  if  and  only  if  bit  k  of 
word  i  is  not  equal  to  bit  k  of  word  j  and  all  other  corresponding  bits  of  i 
and  j  are  equal.  " 

As  an  example,  in  a  16-word  merging -separating  memory,  the  first  merg¬ 
ing  step  treats  the  following  eight  pairs: 

(0,8)  (1,9)  (2,10)  (3,11)  (4,12)  (5,13)  (6,14)  (7,15). 

The  second  merging  step  treats  these  pairs: 

(0,4)  (1,5)  (2,6)  (3,7)  (8,12)  (9,13)  (10,14)  (11,15). 

The  third  merging  step  treats  these  pairs: 
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(0,2)  (1,3)  (4.6)  (5,7)  (8,10)  (9.11)  (12,14)  (13,15). 

The  fourth  merging  step,  theae  pairs: 

(0,1)  (2,3)  (4,5)  (6,7)  (8,9)  (10,11)  (12,13)  (14,15). 

In  each  step  of  the  separating  phase,  the  2n  words  also  are  formed  into 
2  "  pairs.  The  pairing  rule  is  the  reverse  of  that  for  the  merging 

phase;  that  is,  the  first  separating  step  works  with  the  same  pairs  as 
those  considered  in  the  last  merging  step,  the  second  separating  step 
corresponds  to  the  next-to-the-last  merging  step,  etc.  The  two  words 
in  each  pair  may  or  may  not  be  exchanged.  The  exchange  rule  is  ex¬ 
plained  as  follows.  For 

0  -  i  -  2n  -  1  and  0  ~  k  =  n  -  1  , 


si,  k  * li!  = 

and  the  least -significant  k  bits  of  j  equal  the  corresponding  least-signifi¬ 
cant  k  bits  of  i. 


As  an  example,  if  n  =  3,  then 

S0,  0  *  !*’  2’  3'  4’  5'  6’  ’I 

»1.  0  ■  I2’  3'  4’  *<  7| 

S2,  0  *  I3’  4’  5’  6'  7I 

S,  „  =  j4,  5,  6,  7| 

54,  0  ■  I5'  6'  7I 

55,  0  =  K  7| 


S2,  1  * 


2.  4,  6| 


4’  6i 

5.  T| 


S6,  1  *  i 


S0.  2  *  |4 


3,  5,  7  8  2  *  |5 


S2,  2  *  I6 


53.  2  *  I7 

54,  2  *  * 

ss,  2  1  * 
S6,  2  *  * 


S7,  0  '  * 


S7,  1  *  * 


S7,  2  *  * 
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where  i  denotes  an  empty  set.  Let  t.  ,  denote  the  number  of  flagged 

li  x 

words  with  indices  in  S.  ,  (if  S  ,  is  empty,  t.  ,  =  0).  The  exchange 

i,  k  _  i 

rule  for  the  pair  of  words  with  indices  m  and  m  +  2  at  separating 
step  k  is:  "Exchange  m  and  m  +  2^'  if  and  only  if  t^  k  -  1 

As  an  example,  suppose  words  3,  ^ ,  and  7  are  flagged  in  an  eight -word 
memory.  Then  tQ  Q,  t^  0,  t2  Q,  and  tfe>  Q  are  odd  and  Q,  Q, 
tg  q,  and  t^  q  are. even.  In  the  first  separating  step,  the  words  in  pairs 
(0;  1),  (2,  3),  and  (6,  7)  are  exchanged  and  the  words  in  pair  (4,  5)  are 
unchanged.  The  flagged  words  are  now  in  2,  6,  and  7  so  tj  j,  t^  j, 
t3  y  t4>  v  and  tg(  j  are  odd  and  tQ>  j,  t^  f  and  t?>  {  are  even. 


In  the  second  separating  step,  the  words  in  pairs  (1,  3),  (4,  6),  and  (5,  7) 

are  exchanged  and  the  words  in  pair  (0,  2)  are  unchanged.  The  flagged 

words  are  now  in  2,  4,  and  5  so  t^  2  and  t^  2  are  odd,  t^  2 ,  ^ * 

t,  ,,  c.  i.  t,  ,  .  and  t.  .  are  even.  In  the  third  separating  step, 

4,  2  5|  t  o»  t  '»  ^  , 

the  words  in  pairs  (0,  4)  and  (1,  5)  are  exchanged  and  the  words  in  pairs 

(2,  6)  and  (3,  7)  are  unchanged.  Figure  VII-2  shows  the  interchanges 
performed  (underlined  indices  indicate  the  flagged  words).  In  this  ex¬ 
ample,  the  three  flagged  words  were  moved  to  the  low  end  of  memory 
with  their  order  reversed  «uid  the  unflagged  words  were  moved  to  the 
high  end  with  their  order  preserved. 


3.  PARALLEL  MEMORY 

In  the  parallel  form  of  a  merging-separating  memory,  there  are  half  as 
many  word  stores  as  words.  Each  word  store  contains  storage  for  two 
words  plus  the  logic  for  the  comparing,  flagging,  and  separating  func¬ 
tions. 

In  the  n  merging  +  n  separating  step#  of  a  Zn-wt>rd  memory,  words  shift 
between  the  word  stores  so  that  each  of  the  deeired  pair*  ie  formed.  The 
words  can  be  arranged  in  memory  so  that  tha  earn#  wires  can  be  used  be¬ 
tween  each  pair  of  consecutive  steps  of  the  merge.  An  example  is  shown 
in  Figures  VII -J  and  VlI-4.  In  Figure  VU-J,  the  eight  pairs  in  each  of  the 


-  i67- 


APPENDIX  VU 


four  steps  of  a  16 -word  merge  are  arranged  so  that  the  wiring  patterns 
between  all  pairs  of  consecutive  steps  are  identical;  in  Figure  VII-4, 
e  elements  are  shown  interconnected  so  that  they  can  be  used  to  per¬ 
form  the  same  function  as  the  32  elements  of  Figure  VII -3. 

n  n  1 

If  the  words  in  a  2  -word  memory  are  indexed  by  0,  1,  2,  .  .  .  ,  2 

and  if  each  location  is  given  the  same  index  as  the  word  it  contains  in  the 
last  merging  step,  then  the  location  of  any  word  at  any  step  is  given  by 
the  rule:  "Word  i  is  in  location  j  at  step  k  if  the  n-bit  binary  representa¬ 
tion  of  j  is  the  binary  representation  of  i  shifted  right  end-around  n  -  k 
places. 

The  wiring  rule  between  the  locations  is:  "Location  i  feeds  location  j  if 
the  n-bit  binary  representation  of  j  is  the  same  as  that  for  i  shifted  left 
one  place  end-around.  "  Thin  shows  the  wiring  necessary  for  the  merg¬ 
ing  phase. 

Since  the  steps  of  the  separating  phase  are  in  the  reverse  order,  the  wires 
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Figure  VII-3  -  Sixteen-Word  Merge  Arranged  for  Same  Wiring 
Pattern  between  Each  Pair  of  Levels 
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Figure  VI1-8  -  Word  Store  for  36 -Bit  Words  (3-Level  Cascade) 
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for  separating  are  given  by:  "Location  i  feeds  location  j  if  the  n-bit  rep* 
resentation  of  j  is  the  same  as  that  for  i  shifted  right  one  place  end- 
around.  "  These  would  be  the  same  wires  as  those  for  merging  except 
information  travels  through  them  in  the  reverse  direction,  lo  simplify 
wiring,  the  same  wires  may  be  used  for  both  phases,  with  the  correct 
input  and  output  gates  being  turned  on  to  direct  the  information  correctly 
(Figure  VII-5).  For  parallel  word  transfer,  the  single  wires  in  Figure 
VII-5  actually  are  busses. 

Each  word  store  consists  of  a  number  of  digit  stores  plus  interconnecting 
logic.  A  digit  store  is  shown  in  Figure  VII-6.  It  stores  one  digit  of  each 
number,  a.  and  b.,  respectively,  in  shift  register  stages.  The  outputs 
g^  =  ai  b.  and  t^  =  a.  v  b^  are  used  in  the  comparison  logic  of  the  ele¬ 
ment.  The  combined  input-outputs  L ,  e.,  o^  and  h^  connect  to  the  cor¬ 
responding  input-outputs  of  other  elements.  These  connections  are  shown 

n  •  1 

in  the  following  rules  (the  elements  are  numbered  0,  1,  2,  ....  2  -  1) 

1.  If  k  is  even,  then  e^  of  element  k  connects  to  L  of 

element  l/2k  and  o.  of  element  k  connects  to  1.  of 
_  ,  t  i 

l/2k  +  2  2. 

2.  If  k  is  odd,  then  e.  of  element  k  connects  to  h.  of 

i  i 

element  l/2(k  -  1)  and  o.  of  element  k  connects  to 
h.  of  element  l/2(k  -  1)  +  2n  "  2. 

3.  If  k  <  2n  2,  then  1.  of  element  k  connects  to  e.  of 

i  i 

element  2k  and  h.  of  element  k  connects  tc  •.  of  ele- 

t  i 

ment  2k  +  1. 

4.  If  k  -  2n  "  2,  then  1.  of  element  k  connects  to  o.  of 

n  -  1  1  1 

element  2k  -  2  and  h.  of  element  k  connects  to 

o.  of  element  2k  -  2n  ’  *  +  1. 

i 

The  other  variables  in  Figure  VII-6  are  M,  which  is  "on"  in  the  merging 
phase;  S,  which  is  "on"  in  the  separating  phase; X,  which  is  "on  '  if  the 
two  words  should  be  exchanged  (for  example.  X  =  1  if  A  >'  B  in  the  merg¬ 
ing  phase);  and  X,  which  is  "on"  if  the  words  should  not  be  exchanged. 
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The  comparison  logic  and  control  signals  M,  S,  MX,  MX,  SX,  and  SX  in 
terconnect  the  digit  stores  of  a  particular  element.  A  fast  comparison 
circuit  can  be  realized  with  a  "look-ahead"  circuit.  This  technique  is 
similar  to  the  "carry  look-ahead"  technique*1  and  consists  of  a  grouping 
of  the  logic  in  2-groups  or  3-groups,  etc.  For  a  comparison  circuit,  a 
p-group  (for  any  p  -  2)  is  shown  in  Figure  VII-7.  It  requires  p  "and" 
gates  with  a  total  of  2  +  3  +  .  .  .  +  p  4  p  =  l/2(p^  4  3p  -  2)  gate  inputs. 
These  groups  are  cascaded  to  form  the  comparison  logic;  Figure  VII-8 
shows  an  example  of  the  cascade  for  36 -bit  words.  The  G  output  of  the 
4-group  is  "on"  if  and  only  if  A  >  B.  When  it  is  "on,  "  it  causes  an  ex¬ 
change  of  A  and  B  on  the  outputs  (during  the  merging  phase)  by  means  of 
MX;  if  it  is  "off,  "  MX  is  "on"  (during  merging)  to  cause  an  output  with  no 
exchange.  The  T  outputs  cf  the  4-group  and  the  right-most  3-groups  on 
each  level  can  be  eliminated  as  they  are  not  used.  This  is  also  true  of  t( 
in  the  first  digit  store. 

In  the  separating  phase,  the  flag  bits  of  A  and  B  in  each  word  store  are 
fed  into  a  ring-sum  tree.  The  ring-sum  tree  generates  the  control  sig¬ 
nals  SX  and  SX  for  each  word  store  It  consists  of  2n  -  2  ring-sum  ele¬ 
ments  (Figure  VII-9),  each  of  which  consists  of  two  exclusive-or  circuits, 
each  generating  a  true  ami  complement  output.  The  logical  equations  (for 
c  -  0)  are: 

X  =  x0»Xj  , 
y^  -  Xj#Y  ,  and 

y»  =  Y- 

When  C  =  !,  Y  =  0  and  the  logic  is  changed  to  force  y^  -  0,  C  is  a  con¬ 
trol  signal  used  to  break  the  ring-sum  tree  into  smaller  piece,*.  An  ex¬ 
ample  ring-rum  tree  for  a  64-word  memory  i*  shown  in  Figure  VII- 10. 

*M*cSorley,  O.  E. :  "High-Speed  Arithmetic  in  Binary  Computers."  Proc. 

IRE,  Voi  49,  No.  1,  January  1961. 
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In  some  places  on  the  figure,  the  complementary  signals  are  not  indicated; 
in  r.ll  connections  between  ring -sum  elements,  the  X  and  X  outputs  ff:ed 
Xy  and  Xq  of  another  element,  respectively,  or  they  feed  Xj  and  x^  cu  an¬ 
other  element.  Similarly,  the  yg  and  y^  (y^  and  y,)  outputs  feed  Y  and  Y 
of  another  element,  respectively.  During  the  first  separating  step,  S  is 

turned  on  and  C,,  C_,  C,,  C . ,  and  C,  are  left  off.  In  the  second  step, 

1  c  5  4  5 

Cj  is  turned  on  (S  is  left  on);  this  disconnects  the  ring -sum  tree  into  two 
parts.  In  the  third  step,  Cj,  C^,  and  S  are  "on,  "  disconnecting  the  tree 
into  four  parts.  In  the  fourth  step,  C^,  C^>  C^,  and  S  are  "on"  and  the 
tree  is  in  eight  parts.  Cj,  C^,  C^,  C^,  and  S  are  "on"  in  the  fifth  step, 
disconnecting  the  tree  into  16  parts.  In  the  last  step,  C^,  C^,  C^. 

C5,  and  S  are  "on"  and  the  tree  is  in  3?  parts.  The  way  words  are  trans¬ 
ferred  between  the  steps  and  the  way  control  signals  a:  i  turned  on  causes 
each  word  store  to  receive  the  correct  exchange  signals,  SX  and  SX  (see 
the  separating  phase  discussion  in  Item  2  above). 

On  the  first  separating  step,  the  longest  path  in  the  ring-sum  tree  for  2n 
words  goes  through  2n  -  1  logic  elements;  on  the  second  step,  it  goes 
through  ?.n  -  3  elements;  on  the  third  step  it  goes  through  2n  5  aments, 
etc.  To  decrease  the  cycle  time  to  a  minimum,  a  special  clock  with  a 
long  interval  can  be  used  during  the  first  separating  step,  a  shorter  in¬ 
terval  during  the  second  separating  step,  etc. 

In  the  flagging  phase,  the  words  to  be  separated  are  flagged  and  the  con¬ 
tents  of  words  are  tr<msferred  to  the  read  requests.  There  may  be  sev¬ 
eral  read  req  ests  bun-.hed  raiding  the  same  word.  It  would  take  .in  in¬ 
ordinate  amount  of  logic  to  transfer  the  word  in  parallel  to  all  such  re¬ 
quests  so  in  this  situation  only  the  topmost  read  request  (the  request  just 
below  the  word  being  read)  receives  the  data.  After  the  separating  phase, 
all  such  read  requests  will  still  be  together  and  the  topmost  request  can 
then  send  the  data  to  all  the  others. 

A  control  field  in  the  low-order  bits  of  each  word  identifies  the  word  as  a 
memory  word,  a  read  request,  a  read  and  erase  request,  or  an  erasa 
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limit.  For  read  requests  and  read  and  erase  requests,  the  control  field 
also  identifies  the  particular  output  channel  involved.  It  is  desirable  to 
arrange  the  control  field  codes  so  that  for  erase  limits  they  are  above 
(when  read  as  binary  numbers)  those  for  memory  words  which  in  turn 
are  above  those  for, the  read  requests  and  the  read  and  erase  requests. 

A  good  control  field  code  then  is: 


C1 

C2 

Cj  .... 

.  C 

n 

1 

1 

X  x  . 

X  X 

Erase  limit 

1 

0 

x  x  . 

X  X 

Memory  v.  ord 

0 

1 

Channel  number 

Read  and  erase  request 

0 

0 

Channel  number 

Read  request 

In  the  flagging  phase,  the  following  are  to  be  flagged: 

1.  Erase  limits 

2.  Read  and  erase  requests 

Z.  Kead  requests 

4.  Memory  words  just  above  read 
and  erase  requests. 

If  C 2  is  picked  lor  the  flag,  then  the  substitution  for  the  flag  bit  of  the  i 
word  during  flagging  is: 


th 


C2(l>— C2 


(i) 


V 


v  [c,«‘  -  V  '  "\ 


Cj  is  left  alone  so  that  it  can  be  used  to  separate  the  requests  from  the 
erasures  and  erase  limits  after  all  these  words  have  been  separated  from 
the  other  memory  words. 

The  memory  words  ran  be  transferred  to  read  requests  by  writing  the 
whole  memory  word  (except  its  control  field)  into  the  read  request,  leav¬ 
ing  the  read  request  control  field  alone  (the  read  request  is  just  below  the 
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memory  word).  Parallel  transfer  gates  from  the  A  word  of  each  word 
store  to  the  B  word  of  the  same  word  store  and  gates  from  the  B  word  of 
each  word  siore  to  the  A  word  of  the  next  lower  word  store  are  needed 
for  this.  This  involves  much  wiring. 

The  ‘lagging  phase  takes  one  time  step.  The  exchange  phase  consists  of 
one  time  step  during  which  all  separated  words  are  transferred  out  of 
memory  and  replaced  with  new  requests,  memory  words,  or  blank  words. 
Checks  are  made  to  inhibit  writing  over  any  memory  word. 

4.  CONCLUSIONS 

A  parallel  merging -separating  memory  has  been  described.  It  has  the 
advantage  over  a  complete  sorting  memory  of  taking  less  time  steps.  Its 
operation  is  faster  than  a  serial  memory  because  whole  words  are  treated 
at  once;  this  time  advantage  is  about  2  to  1.  The  wiring  will  be  more 
complex  than  in  a  serial  memory  and  the  cost  will  be  higher  because  of 
this  and  also  because  there  are  many  more  different  kinds  of  elements 
than  in  a  serial  memory. 
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APPENDIX  VIII  -  PROBLEM  SELECTION  FOR  A  PARALLEL  PROCESSOR 


1.  INTRODUCTION 

This  appendix  presents  some  of  the  analytical  results  obtained  in  the  se¬ 
lection  of  problems  for  implementation  on  a  parallel  processor  (see  Ap¬ 
pendix  VI).  Parallel  execution  of  the  following  mathematical  methods  is 
discussed:  Jacobi's  method  of  eigenvalue  determination,  relaxation  solu¬ 
tion  of  a  system  of  linear  algebraic  equations,  and  numerical  solution  of 
Laplace's  equation. 

2.  JACOBI'S  METHOD 
a.  Discussion 

mm  mmm^tm. 

(1)  General 

Jacobi's  method  is  a  mathematical  technique  for  finding  the  eigenvalues 
and  eigenvectors  of  a  real  symmetric  matrix.  The  method  is  based  on 
the  following  well-known  theorem  from  matrix  algebra. 

(2)  Theorem  1 

Let  A  *  (a^)  be  an  n  X  n  real  symmetric  matrix.  Then  there  exists  an 
orthogonal  matrix  U  such  that 

U'AU  =  D(\.  .  .  .,  An) 

=  D.  (1) 

where  U*  denotes  the  transpose  of  U;  D  *  D(Aj,  A^ . Aft)  denotes  a 

diagonal  matrix;  and  {  A^j  (i  *  1,  2 . n)  are  the  eigenvalues  of 

A.  Since  in  (1)  U  is  orthogonal, 

AU  =  UD  (2) 

and  hence  the  columns  of  U  are  the  eigenvectors  of  A. 
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Jacobi's  method  specifies  the  construction  of  a  sequence  of  orthogonal 
matrices  Tj,  ....  such  that 

TK  Tk  -  1  •  •  •  Ti  'T1T2  •  •  •  Tk  -  c  •  (3) 

where  C  in  an  n  X  n  matrix  whose  off-di&gonal  elements  are  arbitrarily 
close  tc  zero  and  whose  diagonal  elements  are  arbitrarily  close  to  the 
eigenvalues  of  A.  The  columns  of  the  matrix  TjT^  •  •  •  are  then  ar¬ 
bitrarily  close  to  the  eigenvectors  of  A. 

The  sequence  of  matrices  Tj,  T^»  .  .  . ,  T^  is  constructed  as  follows. 

(3’  Construction  of  Tj 

From  the  elemer  ts  above  the  main  diagonal  of  A  select  the  one  of  largest 
magnitude,  say  a. ..  Then  define 


-a. 


Letting 


and 


Tj  is  defined  as 


tan  29  *  . 


U  -  Z) 

u  jj/ 


c  =  cos  6 


s  =  sin  0 


>  • 


T1  '  V  Wh#r**pq  * 


cifp*q*iorp»q 
s  if  p  »  i,  q  »  j 
-s  if  p  *  j,  q  ■  i 
1  if  p  *  q  4  i  or  j 
0  otherwise 


(4) 


(5) 


<6> 


More  simply, 


.184. 


:;C-T - ,  y^Lii-vs:  v .^^•y^,iH-,r-!r.<i t* .  ■*»>>« 


APPENDIX  VIH 


APPENDIX  Vm 


1  if  x  >  C 

where  sgn(x)  =  ■  0  if  x  =•  0  , 

•  1  if  x  0 


then  one  can  write: 


s  -  sin  6  = 


21  +  V!  . 


c  =  cos  8  =  *  1  -  sin  0 

Hence,  the  computation  of  s  and  c  involves  only  algebraic  relationships 
and  no  computation  of  trigonometric  functions  is  required. 

(4)  Construction  of  +  ^ 

Assuming  Tj,  have  been  computed,  define 

Ak  =  Ti<T'k-.---TiATiV--V  «"» 

Then  select  from  the  elements  above  the  main  diagonal  of  A^,  the  one  of 
largest  magnitude,  and  calculate  the  elements  of  Tjr  +  j  in  the  fashion 
specified  by  (4)  through  (10). 

That  matrices  T.  of  the  type  (7)  are  orthogonal  is  easily  seen  by  forming 

,  i  ' 

T,  T.  =  I.  It  is  also  evident  that  T,  ,  ,  A  T  is  a  real  symmetric 
li  k+lkk+1  7 

matrix  if  A^  is,  for  if  it  is  assumed  that  A  is  real  and  symmetric,  Tj1  A 
Tj  is  obviously  real.  Further, 


(T'j  A  T  j)'  =  T  j  A' 


T>  ATj  . 


and  hence  T^'  A  Tj  is  symmetric.  The  general  case  follows  by  induction. 
It  is  easily  seen  that  premultiplying  a  matrix  A^  by  Tj^  +  j  results  in  a 
matrix  Tj^  +  j  that  is  identical  to  except  in  the  i1*1  and  j**1  rows, 
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i  and  j  being  determined  by  the  above  diagonal  term  of  having  the  largest 
magnitude.  Similarly,  postmultiplyir.g  a  matrix  +  ^  A^  by  (  ^  re¬ 


sults  in  a  matrix  T 


k  +  1 


\Tk*rAi,n thM  u  idemi"al ">  Ti  + ,  \ 


th  tH 

except  in  the  i  and  j  columns.  A  little  arithmetic  will  show  that  the  i,  j 
and  j,  i  elements  of  A,^  ^  are  zero.  It  may  be,  of  course,  that  the  "i,  j" 

and  "j.  i"  spots  previously  zeroed  out  in  forming  A-^  no  longer  will  be  zero 

in  \  ♦  r 

However,  if  tc(A)  is  defined  as  the  sum  of  the  squares  of  the  off  -diagonal 


terms  of  the  matrix  Aj,  it  can  be  shown 


1 ,  a 


that 


•X  ♦  !>  '  '2<V  <U> 

and  hence  the  sequence  A,  Aj,  A2»  .  .  ,  Afc  generated  by  the  Jacob;  meth¬ 

od  coverges  to  D  =  D(A.,  A.,  .  ...  X  ),  the  diagonal  matrix  of  the  eigen- 
values  of  A,  and  that  the  columns  of  converge  to  the  eigen¬ 

vectors  of  A. 


b.  Parallel  Execution 

The  method  of  Jacob;,  as  outlined  above,  lends  itself  well  to  parallel  com¬ 
putation.  Matrix  operations  are,  of  course,  well  suited  for  parallel  com¬ 
putation.  As  an  example,  consider  the  product  C  =  AB  of  two  matrices 

A  =  (a1..),  B  =  (b. ,).  Now  in  C,  the  i,  j  element  is 
h)  ij 

n 

C.j  "  2  CikbkJ  •  (M) 

k  ^  1 

That  is,  the  element  in  the  i,  j  spot  of  C  is  just  the  dot  product  of  the  itn 
row  of  A  and  the  j  n  column  of  B.  Clearly,  given  A  and  B,  each  of  the 
elements  of  C  may  be  calculated  independently  of  the  others.  And  for  each 
element  of  C  the  multiplications  involved  in  the  corresponding  dot  product 


aSuperior  numbers  in  the  text  refer  to  references  under  Subhead  7  or.  Page  220. 
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may  be  done  in  parallel  and  the  summing  involved  may  be  treed.  The  ex¬ 
tensive  matrix  operations  involved  in  the  Jacobi  method  are  then  well 
suited  to  parallel  computation. 

There  are  two  computational  aspects  of  the  Jacobi  method  for  which  capa¬ 
bilities  resident  in  parallel  processors  having  sorting  memories  are 
ideally  suited  (soe  Appendix  VI).  These  aspects  are  (1)  the  determina¬ 
tion  for  a  matrix  A  c:  the  above  diagonal  element  of  largest  magnitude, 
and  (2)  the  test  for  convergence,  namely, 

t?’!A.)<£  (15) 

for  some  given  epsilon. 

The  test  (15)  for  convergence  rnay  be  replaced  by  requiring  that  the  mag¬ 
nitude  of  the  .largest  off-diagonal  element  of  A.  be  less  than  some  given 
epsilon. 

Since  each  of  tho  two  computational  aspects  cited  ubove  involves  the  de¬ 
termination  of  the  largest  member  of  a  given  set  of  elements,  the  rapid 
sort  capability  of  Machines  I  or  H  (Appendixes  VI  and  XV)  may  be  profit¬ 
ably  brought  io  bear  in  their  execution. 

3.  THE  RELAXATION  TECHNIQUE 
a.  Discussion 

Relaxation  is  a  term  originally  apolied  by  R.  V.  Southwell  to  a  class  of 
iterative  methods  for  solving  a  system  of  linear  equations.  The  term 
has  since  come  ,o  connote  a  broad  class  of  method*  for  the  approximate 
reformulation  of  physical  problems  in  terms  of  systems  of  linear  equa¬ 
tions  to  be  solved.  An  example  of  this  expanded  vise  of  the  tern,  relaxa¬ 
tion  is  offered  under  Item  4  below  where  a  numerical  solution  to  Laplace's 
equation  is  discussed.  In  the  strict  sense,  the  lelaxaiion  technique  pro¬ 
vides  a  method  for  solving  a  sy?t«m  of  linear  algebraic  equations  ex¬ 
pressed  in  matrix  lev:-,  as 
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AX  =  B  ,  {16) 

where  A  linn  r,  X  n  coefficient  matrix  of  known  constant*,  X  =  (x^  x 

x,,  .  .  . ,  x  )  is  a  column  vector  of  unknowns,  and  B  =  (b _ ,  b.,  .  .  ,  , 

3  n  l  L 

b^)  is  a  column  vector  of  known  constants. 

The  relaxation  technique  is  an  iterative  procedure  that  specifies  a  se- 

i  1  i 

quence  X^,  X2,  .  .  . ,  X^  where  Xi  =  (x^,  x^ . xn)  of  approxima¬ 

tions  that  converge  to  the  solution  vector  X.  Discussions  ox  necessary 
and  sufficient  conditions  for  convergence  may  be  found  in  references  1 
through  4.  The  technique  assumes  an  initial  guess,  Xj,  and  computes 

successively  vectors  R.  =  (r1.,  r* . r* )  of  "residuals"  defined  as 

l  lx  n 

R.  =  B  -  AX.  (17) 

ii 

for  i  *  1.  2,  .  .  .  ,  k. 

Tse  residual  vector  R  provides  a  measure  of  the  closeness  of  the  approxi¬ 
mation  X.  to  X.  3a»ea  on  a  residual  vector  R^,  the  relaxation  technique 
specifies  a  new  approximation  X.  +  j.  The  process  continues  until  the 
elements  of  the  residual  vector  are  sufficiently  close  to  sero  to  satisfy 
a  pre-established  convergence  criterion  such  as  R.  •  R^  <  f  or 


max 

k 


Given  a  residual  vector  Tl  =  {x\.  ri..  .  .  .  ,  rl ).  th.  relaxation  procedure 

i  12.  n 

specific?  a  new  apprcxiwalion  X.  f  j  ot  the  form 


x.  .  *  x  *x  u 

S  *  !  1  P  p 


th 


where  U  is  the  p  '  coordinate  vector,  rvamelv 
P 


U  * 

P 


<V  V 


8  )  . 

np 


(18) 


and  X  is  a  constant  to  be  chossr  euch  that  the  p1*'  element,  of  the 

P  P 

residual  vector  R  .  *  B  -  A  X .  ,,  is  sero.  P  nr  ay  bi  specified  in  a 

i  ♦  1  i*l  r 
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cyclic  order,  for  example  in  terms  any  permutation  of  the  integers  1,  2, 

.  .  . ,  n  (n  being  the  order  of  the  matrix  A),  or  according  to  some  prede¬ 
termined  criterion.  The  process  of  choosing  +  ^  =  +  *n  t*rm8 

of  a  cyclic  determination  of  p  is  known  as  the  Gauss -Seidel  iteration.  ^ 


More  rapid  convergence  of  the  relaxation  technique  is  obtained  if  Ap  is 
chosen  so  that 


xp! 


tin] 

rather  than  specifying  p  in  a  cyclic  fashion. 


(20) 


It  is  possible  to  determine  A  ,  where  p  =  1,  2 . n,  as  follows:  For 


t P  !  x  i 

given  p  and  present  approximation  X.  =  (x  ,  x_,  .  .  .  ,  x  ),  the  requi 
i't’l  i  i  £  n  i’^1 

”  =  C  means  that  if  the  residual  vector  R.  .  ,  =  (r,  , 

l  +  l  1 


re  - 


ment  that  r 

P 

1  *f  1  1  -f  J 

,  r^  )  then  the  dot  product  R.  +  ^  =  0. 

Now  R.  .  .  =  B  -  AX.  .  and  X.  .  =  X.  +  A  U  . 
i+l  l+l  i  +  l  i  pp 

Hence  (B  -  AX.  )  •  U  =0  and 

i  +  l  D 


Letting 


■p  -  L 


k  =  1 


PP 


^1*  ^2 . ^n^  ’ 


(21) 


122) 


then 


A=  (r\/*u.  . r|,Ann) 


(23) 


At  each  stage  cf  the  relaxation  iteration,  a  new  approximation  to  the  solu¬ 
tion  vector  X  is  specified  in  terms  of  the  last  approximation  and  the  ele¬ 
ment  of  (23)  having  maximum  magnitude. 
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The  rate  of  convergence  of  the  relaxation  process  may  be  increased  by 
modifying  the  value  of  Ap  used  in  the  iteration.  If  Ap  is  replaced  by 


it  is  known  that  for 


0  <  w  <  2 


convergence  of  a  relaxation  iteration  is  preserved.  The  factor  w  is  called 
an  acceleration  parameter  or  relaxation  factor.  The  term  under  [over}  re¬ 
laxation  is  applied  ’  the  case  where  0<w<lQ<w<2],  (It  must  be 
stressed  he?a  that  the  acceleration  parameter  w  is  used  to  accelerate,  not 
establish,  convergence  of  a  relaxation  iteration. )  The  central  problem 
associated  with  the  use  of  an  acceleration  parameter  w  is  to  determine 
the  optimal  value,  wQpt#  f°r  w;  that  is,  the  value  of  w  for  which  the  con¬ 
vergence  rate  of  the  relaxation  iteration  is  maximised.  The  theoretical 
determination  of  w  t  for  the  relaxation  solution  cf  a  system  of  equations 
expressed  in  matrix  form  as 

AX  =  B  (26) 

proceeds  as  follows.  Let  the  matrix  A  be  represented  as 


/  r»\ 

■  UD ) 


where  £,  D,  and  F  are  lower  triangular,  diagonal,  and  upper  triangular 
matrices.  Defining  a  matrix  H  as 


H  *  -(D  +  £)”  *F 

and  denoting  by  S(H)  the  spectrum  of  H,  then  compute 


H  =  max 


[««)] . 


That  is,  jj  is  the  eigenvalue  of  H  having  the  largest  magnitude.  If  the  re¬ 
laxation  iteration  is  convergent  for  the  system  (26),  then  |»t|  <  1  (ref  2) 
and  w  may  be  computed  as 
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(31) 


Observe  that  implementation  of  the  derivation  of  wo^t  presented  above  in¬ 
volves  the  solution  of  an  eigenvalue  problem  that  may  be  at  least  as  diffi- 
cult  as  the  original  problem.  Forsythe  makes  the  discouraging  observa¬ 


tion  that  no  generally  acceptable  technique  for  accurately  approximating 
w  ^  as  the  relaxation  iteration  proceeds  is  known.  Householder^  recently 
confirmed  this  observation. 


— *  Parallel  Execution 

The  relaxation  method  outlined  above  involves  the  repeated  execution  of 
the  operations  of  matrix  multiplication  and  addition,  multiplication  of  a 
vector  by  a  scalar,  and  searching  a  set  for  the  elemert  of  largest  magni¬ 
tude.  As  was  pointed  out  under  Item  2,  b  above,  these  ope  -ations  are  well 
suited  to  parallel  execution,  and  the  operation  of  finding  in  a  set  the  ele¬ 
ment  of  largest  magnitude  may  be  rapidly  accomplished  on  a  parallel 
processor  having  sorting  capability. 


4.  NUMERICAL  SOLUTION  TO  LAPLACE'S  EQUATION 
a.  Discussion 

The  numerical  solution  of  Laplace's  equation  over  a  rectangular  region  R 
with  boundary  TR  is  discussed  here.  Assume  that  R  is  partitioned  by  an 
equally  spaced  rectangular  mesh  and  that  Dirichlet  boundary  conditions 
are  specified.  Given  a  function  u(x  y)  for  which  Laplace's  equation  ob¬ 
tains  over  R,  write 


Letting  the  interval  for  the  mesh  over  R  be  denoted  by  A,  the  partial 
derivatives  for  u(x,  y)  may  be  approximated  by 
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9u  u(x  +  A,  y)  -  u(x,  y) 

ax  ~  A 

3u  _  u(x,  y  *  A)  -  u(x,  y) 

ay  A 


a"u  _  u(x  +  A,  y)  -  2u(x,  y)  +  u(x  -  A,  y) 

2  “  7r 

Sx  A 


2 

a  u  _  u(x,  y  4-  A)  -  2u(x,  y)  +  u(x,  y  -  A) 


3y 


A 


(33) 


and  then  the  difference  equation  counterpart  of  (32)  may  be  written  as 


u(x,  y)  =  i|u(x  +  A,  y)  +  u(x  -  A,  y)  +  u(x,  y  +  A)  +  u(x,  y  -  A)j  ,  (34) 

Equation  (34)  approximates  u(x,  y)  at  each  interior  mesh  point  of  R  by  the 
average  of  "north,  south,  east,  west  neighbors.  "  Other  difference  equa¬ 
tion  approximations  to  u(x,  y)  at  interior  points  of  R  are 

u(x,  y)  =  ju(x  +  A,  y  +  A)  +  u(x  +  A,  y  -  A)  +  u(x  -  A,  y  +  A)  + 

u(x  -  A,  y  -  A)]  (35) 

and 


1 

20 


u(x,  y)  =  5'ju(x  +  A,  y)  +  u(x  -  A.  y)  +  u(x,  y  +  A)  +  u(x,  y  -  A)|  + 
u(x  +  A,  y  +  A)  +  u(x  +  A,  y  -  A)  +  u(x  -  A,  y  +  A)  +  u(x  -  A,  y  -  A)j  . 


(36) 


Approximations  (34)  and  (35)  are  often  referred  to  as  "five -point"  formu¬ 
las  and  (36)  as  a  "nine -point"  formula.  It  is  easily  seen  that  approxima¬ 
tions  (34);  (35),  and  (?>c)  represent  u(x,  y)  in  terms  of  +,  X,  and  □  pat¬ 
terns  of  neighbors,  respectively,  and  they  are  referred  to  here  as  approxi¬ 
mations  A,  B,  and  C. 
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An  easily  established  relation  between  approximations  A,  B,  and  c[j(34), 
(35),  and  (36)Jis  given  by 

C  =  |a  +  yB  .  (37) 

Iterative  solutions  to  Laplace's  equation  based  on  approximations  A,  B, 

3 

or  C  converge  and  often  are  called  "relaxation  solutions.  "  A  sequen¬ 
tial  iterative  solution  would  proceed  by  ordering  the  interior  mesh  points 
of  a  region  R  and  cyclicly  applying  one  of  the  approximations  A,  B,  or  C 
over  the  ordering  until  some  specified  convergence  criterion  is  met.  In 
a  sequential  paps  over  the  ordered  interior  mesh  points  of  R,  two  possi¬ 
bilities  for  updating  the  values  for  u(x,  y)  at  each  interior  mesh  point  are 
available:  (i)  as  each  new  approximation  to  u(x,  y)  is  generated  at  a  point, 
it  is  made  available  for  subsequent  calculations  in  the  pass,  and  (2)  each 
pointwise  approximation  to  u(x,  y)  made  in  a  given  pass  uses  only  point 
values  available  at  the  end  oi  the  preceding  pass.  The  former  [latterl 
method  of  updating  often  is  called  the  method  of  successive  [simultane¬ 
ous]  displacements. 

If  the  interior  mesh  points  are  ordered,  say  as  p  ,  p_,  .  .  .  ,  p,  ,  and  at 

X.  Cm  K 

each  point  p.  the  value  of  u(x,  y)  is  regarded  as  a  variable  to  be  deter¬ 
mined,  then  each  of  the  methods  A,  B,  or  C  of  approximating  u(x,  y)  over 
R  may  be  written  in  matrix  form  as 

P  X  =  Q  (38) 

where  X  =  (x^,  x^,  .  .  .  ,  x^)  is  a  vector  of  unknown  corresponding  to 
the  values  of  u(x,  y)  at  the  interior  mesh  points  r  R,  P  is  a  coefficient 
matrix  of  known  constants  determined  by  the  type  of  approximation  (A, 

B,  or  C)  being  used,  and  Q  is  a  vector  of  known  constants  determined  by 
the  approximation  being  used  and  known  boundary  v;  ’  '.s  for  u{x,  y).  The 
system  (38)  may  be  solved  by  relaxation  methods  d*  cussed  under  Item  3 
above. 


Jf  for  a  given  method  of  approximation  to  u(x,  y)  over  the  interior  of  R, 
the  corresponding  matrix  [ja.s  cited  in  (38)]  is  constructed  and  wQpt  (see 


-194- 


APPENDIX  VHI 


Note  the  correspondence  of  (40)  and  (18).  Similar  modifications  of  ap¬ 
proximations  B  and  C  are  readily  specified.  Although  modification  of  the 
method  of  successive  displacements  by  the  use  of  w  t  in  the  fashion  of 
(40)  increases  the  convergence  rate,  the  use  of  w  ^  i*1  conjunction  with 
the  method  of  simultaneous  displacements  is  of  no  profit.  ^ 


The  numerical  solution  to  Laplace's  equation  over  a  rectangular  region 
partitioned  by  an  equally  spaced  rectangular  mesh  is  specified  easily  in 
terms  of  approximations  A,  B,  or  C  and  the  methods  of  simultaneous  or 
successive  displacements.  An  immediate  question  arises  as  to  which  of 
the  available  techniques  offers  the  most  rapid  convergence.  To  compare 
the  relative  merits  of  the  techniques  outlined  above,  code  ITEST  was 
written  in  FORTRAN  IV  for  the  IBM  1410.  ITEST  will  solve  Laplace's 
equation  over  a  9-by-9  square  mesh  of  equal  mesh  spacings  using  ap¬ 
proximations  A,  B,  or  C  (or  combinations)  in  conjunction  with  simul- 
tax.eous  or  successive  displacements.  Table  VIII- 1  lists  some  results 
obtained  using  ITEST.  For  each  of  the  three  runs  listed,  u(x,  y)  was 
specified  to  be  zero  on  the  boundary.  The  true  solution  for  u(x,  y)  was 
then  u(x,  y)  =  0  in  all  cases.  In  run  1,  u(x,  y)  initially  was  specified 
to  be  zero  at  each  interior  mesh  point  except  at  the  "center"  point,  which 
was  specified  as  1.0.  In  runs  2  and  3,  u(x,  y)  was  specified  as  1,  0  at 
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TABLE  VIII- 1  -  RESIDUES  AFTER  TWELVE 
ITERATIONS  FOR  RUNS  1,2,  AND  3 


Approximation 

sequence 

Residues  after  12  iterations 

Run  1 

Run  2 

Run  3 

A,  A,  A,  .  .  . 

0.606 

15.482 

6.458 

B,  B,  B,  .  .  . 

0.218 

5.989 

1.071 

C,  C,  C,  .  .  . 

0.  506 

12.  856 

4.  567 

A,  B,  A,  B,  .  . 

0.  380 

9.602 

2.  746 

A,  C,  At  •  •  « 

0.  554 

14. 105 

5.  436 

B,  C,  B,  C,  .  .  . 

0.  346 

8.  757 

2.285 

A,  Bf  C,  Af  B,  Of  «  •  » 

0.418 

10. 581 

3.267 

each  interior  mesh  point.  For  each  of  the  three  runs,  each  of  seven  dif¬ 
ferent  combinations  of  approximations  A,  B,  and  C  was  used  for  12  itera¬ 
tive  passes  over  the  mesh.  The  seven  combinations  of  A,  B,  and  C  are 
listed  in  column  1  of  Table  VIII- 1.  In  runs  1  and  2,  the  method  of  simul¬ 
taneous  displacements  was  used  while  run  3  employed  successive  displace¬ 
ments  . 

For  each  iterative  pass  over  the  mesh,  a  "residue"  term  was  calculated. 
The  residue  term  is  just  the  sum  of  the  absolute  value  of  the  errors  in  the 
approximation  to  u(x,  y)  at  the  interior  mesh  points.  Columns  2,  3,  and 
4  of  Table  VIII- 1  list  the  residue  term  computed  after  the  twelfth  iterative 
pass  for  each  of  the  seven  combinations  of  A,  B,  and  C  for  runs  1,  2,  and 
3,  respectively.  Figures  VIII- 1  through  VUI-9  contain  the  pointwise  ap¬ 
proximations  to  u(x,  y)  obtained  after  12  iterative  passes  over  the  9-by-9 

mesh  on  runs  1,  2,  and  3  using  successive  approximations  A,  A,  A . 

3,  B ,  B ,  .  ,  . ,  and  C ,  C ,  C ,  .  .  .  . 

Inspection  of  the  table  and  figures  cited  above  reveals  that  for  the  methods 
tested,  the  most  rapid  convergence  is  obtained  by  using  the  method  of  suc¬ 
cessive  displacements  and  approximation  B,  (X).  The  convergence  rate 
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Figure  VJII-8  -  Run  3,  Successive  Displacements,  Approximation  B  (X) 
12  Iterations 
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for  B  could  have  been  accelerated  by  the  use  of  w  .  It  will  be  noted  that 

opt 

in  solving  Laplace's  equation  ever  a  mesh  by  methods  A,  B,  or  C,  the 
sum  of  errors  in  the  approximation  of  u(x.  v)  at  interior  points  will  re¬ 
main  a  constant  until  the  iterative  procedure  successively  spreads  the 
error  at  a  point(s)  to  the  boundary;  it  if.  only  when  boundary  values  are 
brought  to  bear  that  the  total  pointwise  error  in  the  interior  of  the  mesh 
can  be  reduced.  The  method  of  simultaneous  displacements  could  be  im¬ 
plemented  easily  on  the  parallel  processor  described  in  Appendix  VI. 
However,  the  slow  rate  of  convergence  obtained  using  simultaneous  dis¬ 
placements  and  the  difficulty  of  obtairing  applicable  acceleration  parame¬ 
ters  make  the  method  somewhat  unattractive,  even  for  parallel  processors. 


b.  Mesh  Fill  In 

Iterative  numerical  solutions  to  Laplace's  equation  over  a  mesh  begin  with 
the  assumption  of  some  initial  values  for  u(x,  y)  at  interior  points.  Clearly, 
the  greater  the  accuracy  of  the  initial  approximations,  the  more  rapid 
should  be  the  convergence.  A  method  is  described  here  for  filling  the 
interior  of  a  mesh  rapidly  with  accurate  initial  approximations  tc  u(x,  y) 
based  on  known  boundary  values.  This  will  be  confined  to  the  9-by-9  grid 
previously  cited.  Extension  of  ir.e  method  to  any  (2  ‘  +  1)  by  (2  +1)  mesh 

is  immediate. 


Let  the  value  of  u(x,  y)  be  denoted  at  points  of  the  9-by-9  mesh  as  uii,  j ) , 
with  i  and  j  being  determined  as  in  matrix  notation.  Since  {u(5,  1),  u(5,  9), 
u(l,  5).  u(9,  5) j  are  known,  u(5,  5)  may  be  approximated  by  A.  Knowing 
u(5,  5),  {u(3,  3),  u(3,  7),  u{7,  3),  u(7,  7))  may  be  approximated  by  B, 
then  {u{5,  3),  u(5,  7),  u(3,  5),  u(3,  7)}  by  A,  etc.  For  a  9-by-9  mesh, 
five  such  passes  are  required  for  complete  fill-in  of  interior  points.  Fig¬ 
ure  VIII-  10  illustrates  how  the  fill-in  proceeds  for  each  pass.  The  num¬ 
bers  above  the  points  in  the  mesh  of  this  illustration  indicate  the  pass  in 
which  corresponding  approximations  to  u(x,  y)  were  made.  Tht  approxi¬ 
mations  made  during  each  pass  depend  only  on  boundary  values  or  results 
of  the  previous  passes,  or  both;  hence,  they  are  amenable  to  parallel 
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computation.  Accordingly,  the  above  method  of  mesh  fill-in  is  referred 
to  as  parallel  fill-in  (PFI).  PF1  can  be  executed  readily  on  parallel  proc¬ 
essors  where  rapid,  universal  communication  between  processing  ele¬ 
ments  is  available. 

To  check  the  accuracy  of  PFI,  code  FILIN  was  written  in  FORTRAN  IV 
for  the  IBM  1410.  Given  values  for  u(x,  y)  on  the  boundary  of  a  9-by-9 
rectangular  mesh  of  equal  spacing,  FILIN  calculates  initial  approxima¬ 
tions  on  u(x,  y/  at  interior  points  by  PFI.  FILIN  was  run  for  11  different 
sets  of  boundary  conditions.  Figures  VIII- 11  through  VIII-21  give  the 
boundary  conditions  and  resultant  fill-in  based  on  PFI  for  the  11  runs. 
Inspection  of  these  figures  reveals  the  excellent  results  achieved  by  PFI 
for  the  boundary  conditions  specified.  Although  the  implementation  of 
PFI  is  more  suitable  to  parallel  processors,  its  accuracy  is  such  that 
it  is  to  be  recommended  for  use  on  sequential  machines. 

£.  Parallel  Execution 

The  numerical  solution  to  Laplace’s  equation  over  a  mesh  by  simultaneous 
displacements  is  str\icturally  well  suited  to  parallel  computation.  For  a 
parallel  processor  of  sufficient  size,  a  processing  unit  could  be  assigned 
to  each  interior  mesh  point.  Each  unit  then  would  compute  and  store,  in 
an  iterative  fashion,  approximations  to  u(x,  y)  at  its  assigned  pcint.  The 
communication  capabilities  of  Machines  I  or  II  (Appendixes  VI  and  XV) 
would  allow  the  use  of  any  combination  of  the  approximations  A,  B,  or  C. 
In  the  event  that  the  number  of  interior  mesh  points  exceeded  the  number 
of  processing  units,  each  unit  could  be  assigned  a  block  of  interior  mesh 
points  and  the  iteration  could  proceed  "parallel  by  block  and  sequential  by 
point  within  a  block.  "  However,  although  the  method  of  simultaneous  dis¬ 
placements  is  structurally  well  suited  to  parallel  execution,  its  low  rate 
of  convergence  makes  it  somewhat  unattractive. 

The  method  of  successive  displacements,  while  apparently  unsuited  from 
a  structural  point  of  view,  is  in  fact  quite  attractive  for  parallel  computa¬ 
tion.  The  rate  of  convergence  for  the  method  of  successive  displacements 
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can  be  improved  by  the  use  of  acceleration  parameters.  Further,  since 
in  practice  the  number  of  internal  mesh  points  involved  in  the  solution  of 
Laplace’s  equation  will  greatly  exceed  the  number  of  processing  units 
available  on  a  parallel  processor,  a  "parallel  by  block,  sequential  by 
point  within  a  block"  type  iteration  must  be  used  and  such  an  iteration 
is  well  suited  to  the  method  of  successive  displacements. 

The  PF1  method  for  obtaining  initial  approximations  to  Laplace's  equa¬ 
tion  over  the  interior  points  of  a  mesh  is  ideally  suited  to  parallel  exe¬ 
cution. 

A  test  for  convergence  based  on  maximum  pointwise  change  in  approxi¬ 
mation  values  between  successive  iterations  could  be  accomplished  readily 
on  Machines  I  or  II  due  to  the  rapid  sort  capability. 

5.  CONCLUSIONS 

This  appendix  has  reviewed  several  mathematical  techniques  and  analyzed 
their  suitability  for  parallel  execution.  These  techniques  are  Jacobi' r 
method  for  the  determination  of  eigenvalues  of  real  symmetric  matrices, 
the  relaxation  solution  to  a  system  of  linear  algebraic  equations,  numeri¬ 
cal  solution  to  Laplace's  equation,  and  mesh  fill  in.  Each  technique  was 
seen  to  be  amenable  to  parallel  execution.  It  was  further  seen  that  each 
technique  involved  searching  a  set  for  the  element  of  maximum  magnitude, 
a  process  well  suited  to  a  machine  having  sorting  capability. 

The  inherent  parallelism  resident  in  each  of  the  techniques  provides  a 
suitable  basis  for  a  study  to  determine  optimal  methods  of  parallel  exe¬ 
cution. 

V 
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APPENDIX  IX  -  MACRO  INSTRUCTIONS  FOR  A  PARALLEL  PROCESSOR 


1.  INTRODUCTION 

Concurrent  with  efforts  directed  toward  the  design  and  efficient  utiliza¬ 
tion  of  parallel  processo-s  has  been  the  realization  that  computational 
capabilities  resident  in  parallel  processors  give  rise  to  new  ways  of 
thinking  about  problems,  their  fundamental  structure,  and  appropriate 
solution  models.  It  is  therefore  desirable  that  macro  machine  instruc¬ 
tions,  in  fact  a  programming  language,  be  developed  that  allow  and  in¬ 
deed  promote  ease  of  conceiving  and  expressing  the  structure  of  parallel 
solution  models.  This  appendix  presents  a  brief  list  of  instructions  ca¬ 
pable  of  compactly  representing  operations  within  a  parallel  solution  model. 
Computational  examples  are  given  along  with  a  suggestion  for  expanding 
and  generalizing  the  instructions  into  a  programming  language. 

2.  DEFINITIONS 

Let  the  Greek  letters  a,  v.  .  .  .  denote  vectors  of  the  form 

c  =  (ay  q2-  ....  an)  (1) 

where  a.  (i  =  1,  2,  ....  n)  is  a  real  number  unless  otherwise  specified. 
In  expressions  such  as 

a  -  (aj,  az>  .  .  .  )n  ,  (2) 

the  subscript  n  means  that  a  is  to  be  considered  an  n  vector. 

3.  INSTRUCTIONS 
a.  General 


In  the  following  instructions,  a,  j3,  y  are  as  defined  in  (1)  and  f  denotes  a 
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real  number.  The  elements  a.,  i  =  1,  2,  ,  ,  .  ,  n  of  a  vector  ot  - 
(a.,  Cl^ ,  .  .  .  )  correspond  to  real  numbers  stored  in  a  parallel  proc¬ 
essor.  In  general,  an  instruction  will  specify  the  execution  by  the  paral¬ 
lel  processor  of  some  rule  of  assignment,  T,  that  associates  with  the 
vector(s)  a  a  vector  y.  For  example: 

1.  Suppose  there  is  a  vector  a  =  (Gj,  a^,  ...  ,  an) 

and  the  vector  y  =  2a  =  ( 2cty  2 . 2(2^)  = 

(y i .  y^,  .  .  .  ,  y^)  is  desired.  Then  an  instruc¬ 
tion  is  specified  directing  the  parallel  processor 
to  effect  the  following  rule  of  assignment: 

F:  a—1 ►  2a  =  y  . 

2.  Suppose  there  are  vectors  Of  =  (a,,  Cl-,,  .  .  .  ,  a  ) 

14  n 

and  /3  =  (0|,  . and  the  vector  y  ~ 

a  +  (3  =  (Oj  +  (3y  a2  +  (3^,  .  .  .  ,  t*n  +  /3^)  = 

(y  j ,  y£,  ....  yn)  is  desired.  Then  an  instruc¬ 
tion  is  specified  directing  the  parallel  processor 
to  effect  the  following  rule  of  assignment: 

F:  (a,  3)—^a  +  3  =  y  . 

b.  List  of  Instructions 


A  list  of  instructions  follows.  They  are  designed  primarily  to  specify 
the  parallel  execution  of  common  arithmetic  operations  frequently  en¬ 
countered  in  computational  procedures. 


1. 


Shift  right/left:  (  — +  ,  t)  (a)/(^",  t )  (a).  This  in¬ 
struction  operates  on  a  single  vector,  a  -  (0^.  Qf^. 
.  .  .  ,  Qf^),  to  produce  a  vector  y  =  (yj,  y^, 

.  .  .  ,  y  )  under  the  rule  of  assignment: 

(  0  for  1  =  i  =  t 


for  (“•*,  i)  (a)  , 

a.  for  t  <  i  =  n 
i  -  t 


and 
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0  for  (n  -  t)  <  i  £  n 

Vi  =  ,  ,  lor  ,  t)  (a)  . 

|  fl.  +  t  for  1  1  i  ^  (n  *  t) 

£ 

Note;  a  is  unchanged;  any  overflow  ie  lost. 

Example:  Let  a  =  (1,  3,  5,  7).  Then 

V  =  <— .  2)  (0)  =  (0,  0,  1.  3)  , 

V  “  (— .  1)  (Of)  *  (3,  5,  7,  0)  . 

2.  Shift  right/left  one:  (1~»,  t)  t)  (a).  Thie 

instruction  operates  on  a  single  vector,  a  *  (a^, 

a2 . an),  to  produce  a  vector  y  *  ,  Yj»* 

....  y  )  under  the  rule  of  assignment: 

1  for  1  S  i  S  t 

y  =  for  (1~*,  t)  (t*l  , 

a.  t  for  t  <  i  *  n 

and 

1  for  (n  »  t)  <  i  $  n 

Vi  *  for  (-*-1,  t) 

ai  t  t  for  1  <  i  *  (n  -  t) 

Note:  a  is  unchanged;  any  overflow  is  lost. 

Example:  Let  a  ■  (7,  9,  1.  8,  5).  Then 

Y  *  (i— .  3)  (a)  *  d,  1,  1,  7,  9)  , 

V  «  (—1.  4M«)  *  (3.  I.  1.  1.  U  . 

3.  Spread  right/left:  <j«*.  t>  t^  (Or).  This 

instruction  operates  on  a  single  vector,  a  *  (<*j ,  0^, 

....  «n).  to  produce  a  vector  V  *  (Vj.  Vj*  •  *  •  • 

Yn)  under  the  following  rule  of  assignment: 

4 In  this  and  the  following  instructions  where  "ft  is  unchanged,  "  if  o  •  v  [far 
example,  a  *  {-*,  t)  (3)j .  the  positions  of  c  are  assigned  new  values  under 
the  rule  of  assignment  for  the  inatruction. 


APPENDIX  IX 


O.  for  1  =  i  <  j 

«j  for  j  ^  i  =(j  +  t)  for  t)>  (a)  , 

a.  for(g  f  t)  <  i  -  n 

and 

i  for  1  -  i  <  (j  -  t) 

j  forjj  -  t  -  i  ^  j  for  t>  (a)  . 

i  for  j  <  i  -  n 
Note:  a  is  unchanged;  any  overflow  is  lost. 
Example:  Let  a  =  (7,  9,  1.  8,  5),  Then 

y  =  \2— ►,  2>  (a)  =  (7,  9,  9,  9,  5)  , 

y  =  <-*—3,  4 >  (a)  =  (1,  1,  1,  8,  5)  . 

Note:  overflow  occurs  in  the  above  example. 

4.  Rotate  right/left:  (RR,  t)  (a)/(RL,  t)  (a).  This  in¬ 
struction  operates  on  a  single  vector,  a  =  (a^, 

...  -  £*n),  altering  it  as  follows.  The  elements 
of  a  are  shifted  right/left  t  positions.  Overflow 
cut  the  right/left  is  added  in  on  the  left/right. 

Example:  Let  a  =  (7,  9,  1,  8,  5).  Then 

(KR,  2)  (a)  =  (8,  5,  7,  9,  l)-~  a  , 

(RL,  3)  (a)  =  (8.  5,  7,  9,  1)— -a  . 

5.  Set  sign  plus/minus:  !"sSPj  (a)/[sSM|  (a).  This  in¬ 
struction  operates  on  a  single  vector,  a  =  (flj.  a 
....  an),  altering  it  as  follows.  Each  element 
a.  of  a  is  set  ro  |a. I  /  -  la.| .  Example:  Let  a  = 

(-1,  0,  7,  -4,  \2),  Then 
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R 

\t 

js- 

i- 


'  * 

QSSP](Q!)  a  (1.  0,  7,  4,  12)— Of,  I 

3 

[Ism]  {<*)  *  (-1.  o,  -7,  -4,  -12)-*#  .  I 

6.  Scalar  add/subtr act/ multiply/ divide:  &-3/E  .3/  I 

[x,  f]  /[?,  f]  (a).  This  instruction  operates  on  a  ; 

single  vector,  a  =  (ttj,  •  •  •  .  0fn),  to  produce  i 

a  vector  \  *  (y^,  Y^*  •  •  •  »  Y  )  under  the  rule  of 
assignment:  f 

Y.  a  (a.  +  f)/v?;  -  £}/(«.  X  £)/(«./£)  .  f 

Note:  a  is  unchanged. 

Example:  Let  a  ~  (7,  9,  1,  8,  5),  f  =  3.  Then 

Y  =  [+.  9(a)  *  (10,  12,  4,  11,  8)  , 

Y  =  [-•  D  ia)  =  (4,  6,  5,  2)  , 

Y  *  [x,  ?!  («)  *  (21,  27,  3,  24,  5)  , 

Y  *  fr.  3  (a)  =  (7/3.  3,  1/3,  8/3,  5/3)  . 

7.  Vector  add/subtract/multiply/iivide:  ®/©/®/® 

(a,  P).  This  instruction  operates  on  an  ordered 
pair  of  vectors  (o,  0); 

«  *  («1»  *»2 . «n) 

P  *  ($j>  ,  •  *  •  ,  $n)  > 

to  produce  a  vector 

*  *  <V  . V 

under  the  following  rule  of  assignment: 
vi  *  *ai  +  ^i^ai  "  PJ/la^Aa^PJ 

* 

Note:  a  and  0  are  unchanged.  f 
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Example:  Let  a  =  (7,  9,  1.  8,  5),  0  =  (2,  -3, 

1/3,  -16,  1).  Then 

®  (a,  0)  *  (9,  6,  4/3,  -8,  6)  , 

6  (a,  0)  =  {5,  U,  2/3,  24,  4)  , 

«  {a,  0)  =  (U,  -27,  1/3,  -128,  5)  , 

©  {ft,  0)  =  <7/2,  -3,  3,  -1/2,  5)  , 

8,  Sum:  2L(a).  This  instruction  effectively  is  a  sub¬ 
routine.  It  operates  on  a  single  vector,  a  s  (ttj,  a 2> 
....  afi),  ter  produce  a  1-vector  y  -  ( y under 
the  following  rule  of  assignment: 

•,=  iv 

i  =  1 

Note:  a  is  unchanged. 

Example:  Let  a  =  (7,  9,  1,  8,  5).  Then 

V  =  1(a)  =  (30)  . 

9  Chain:  7T(a).  This  instruction  effectively  is  a  sub¬ 
routine.  It  operates  on  a  single -vector  a  =  («j,  a 

....  a  )  to  produce  a  1 -vector  v  =  (y.)  under  the 
n  i 

following  rule  of  assignment: 

Y‘  = 

Note:  a  is  unchanged. 

Example:  Let  a  =  (7,  9,  1,  8,  5).  Then 
y  *  TT(or)  =  (2520)  . 
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10.  Create:  C{a,  n)(a,,  Ct-.,  .  .  .  ,  a  ).  This  instruc- 

i  c  n 

tion  causes  a  vector  of  length  n,  called  a,  to  be 
stored  in  the  parallel  processor  with  elements  a. 
specified  with  the  instruction. 

Example: 

C (a.  7j{0,  1.  0.  0,  3fc.  0.  0)  —  a  . 

11.  If:  IF  {Jr,  f  )r,  s,  t.  This  instruction  specifies  a 
transfer  of  program  control  according  to  the  follow- 
ing  rules: 

a.  Let  a  =  (a^  a 2 . an)  and  f  = 

( £, ,  t~,  ....  £  )  be  n  vectors. 
i  c  n 

b.  Let  r,  s,  t  specify  locations  to  which  -,;. 
program  control  can  be  transferred. 

c.  Then  program  control  will  be  trans¬ 

ferred  to  r,  s,  or  t  according  to 
whether  ol  <  £.,  cr  >  £. 

for  all  i  =  1,  Z,  .  .  .  ,  n. 

4.  SAMPLE  PROGRAMS 

a.  General 

Some  sample  programs  written  in  terms  of  the  instruction  list  are  ex¬ 
hibited  below.  The  existence  of  a  "DO  LOOP"  type  instruction  is  as¬ 
sumed. 

b.  Program  1 
Given  Xq,  A,  n 

Construct  V  =  {XQI  XQ  +  A,  XQ  4  2A,  ....  4-  nA)n  1 

Define:  L  =  pnj  (n  -  1)J  (by  jxjis  meant  the  greatest  integer  in  X) 
Procedure: 
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C(V,  n  +  1)(0,  A,  0,  .0,  ,  0) 

DO  M  k  =  0,  L 

V*  =  (  — ,  2k)(V) 

V**  =  <(2k  +  1— 2k^>(V) 

M  V  =  ©(V*.  V**) 

V  =  [+.  X0]  (V) 

Then 

V  =  (XQ,  XQ  +  A,  XQ  +  2A,  .  .  .  ,  XQ  +  nA)  . 

Example:  Let  n  =  8.  Then  L  =  JjPn^  (7\|  =  2. 

The  program  would  proceed  as  follows: 

C(V,  9)  (0,  A,  0,  0,  0,  0,  0,  0,  0) 

Going  through  the  DO  Loop  would  give: 

K  =  0 

V*  =  (— ,  1)  {V)  =  (0,  0,  A,  0,  0,  0,  0,0,  0) 

V  =  \2-+,  1>  (V)  *  (0,  A,  A,  0,  0,  0,  0,  0,  0) 

jjc  »}e 

V  =  ©  (V  ,  V  )  =  (0,  A,  2A,  0,  0,  0,  0,  0,  0) 

K  =  1 

V*  =  (  — ,  2)  (V)  =  (0,  0,  0,  A,  2A,  0,  0,  0,  0) 

V  =  <3-*-f  2  >  (V)  =  (0,  A,  2A,  2A,  2A,  0,  0,  0,  0) 

jh  dob 

V  =  ©(V  ,  V  )  =  (0,  A,  2A,  3A,  4A,  0,  0,  0,  0) 

K  =  2 

V*  =  ,  4)  (V)  =  (0,  0,  0,  0,  0,  A,  2A,  3A,  4A) 

V  =  <5—,  \)  (V)  =  (0,  A,  2A,  3 A,  4A,  4A,  4A,  4A,  4A) 
V  =  ®(V  ,  V  )  =  (0,  A,  2 A,  3 A,  4A,  5A,  6A,  7A,  3A) 
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and  finally 

V  =  [+.  X0]  (V)  *  (XQ,  XQ  +  A . XQ  +  8A) 

Program  2 

Given  :  a  -  (Ofj,  (*2,  ....  aj,  0  =  {^,  ,  0n) 

Construct:  y  =  a  *  0  =  the  scalar  product  of  a  and  0 
Procedure: 

V  =  0(0,  0) 

V  =  2(V) 

Then  V  =  (-y^),  where 

n 

^  “A 

i  =  1 

Example:  Let  a  =  (i,  3,  5),  0  =  [2,  4,  6), 

Then 

V  *  0(a,  0)  =  (2,  12,  30) 

V  =  2(V)  =  (44) 

Program  3 

Given.  Qt  *  (Qfj,  •  •  •  >  fl^)i  i ~  ( f|,  ^2*  *  *  *  1 
where  a.  >  0,  £.  >0  for  i  -  1,  2,  ....  n 

Construct:  y  =  (y.,  y0,  ....  y  )  where  y.  =  y and  t,  is  the  con- 
i  »  n  i  l  x 

vergence  criterion  for  a  Newton  iteration 

Comments:  A  Newton  iteration  for  finding  y/x  proceeds  as  follows: 


where  gi  denotes  the  i**1  approximation  to  >/x.  The  manner  of  determining 


-231- 


APPENDIX  IX 


tke  initial  guess,  gQ,  depends  on  the  range  of  x  and,  in  computer  solu¬ 
tions,  the  manner  in  which  a  number  x  is  stored  in  the  machine.  In  the 
program  to  follow,  x/2  is  used  as  an  initial  guess  to  y/x. 


Procedure 

Corresponds  to 

G  =  [x.  0.5]  (a) 

gi  =  x/2,  initial  guess 

B  =  ©(a,  G) 

x/gi 

e  =  ®(B,  G) 

x/gj  +  8, 

y  =  [x,  0.  5]  (6) 

gi+  1  * 

S  =  ©  (y.  G) 

gi  +  1  ”  gi 

$  =  [SSP]  (8) 

!gi  +  i  *  gi! 

IF  (8,  e  )  r,  r,  t 

I«i+ 1  ’  *i  <1 f? 

O 

ii 

No,  (i  +  1)— *i 

Go  to  m 

Iterate  again 

Continue 

Yes,  g.  +  j  *  Vx 

and  then  y  ~  (y^  y2>  .  .  .  ,  yn)  where  y.  =  y'cT* 

5.  OBSERVATIONS 

Some  of  the  properties  of  the  instructions  listed  above  are  as  follows: 

1.  Instructions  1  through  4  involve  essentially  a  shift¬ 
ing  right  or  left  of  the  elements  of  a  vector  a  * 

(Of i »  ....  an)  with  options  of  dropping  over¬ 

flow  with  corresponding  fill-in  by  0's,  i's,  or  end- 
around  carry.  The  resulting  vector  is  an  n-vector 
y  s  ("Y j »  >2’  •  •  •  •  Yn)  elements  from 

|o,  1,  {a.}  (i  «  1.  2 . n)| 

2.  Instructions  5  and  6  involve  a  specified  arithmetic 
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operation  on  each  of  the  elements  of  a  vector  a  = 

(ffj,  ,  ....  Ctn ).  The  resulting  vector  is  an 

r.-vector  y  =  (\j,  y.^ . y^)  with  elements 

specified  in  terms  of  {£*•}.  i  *  1,  2,  ....  n  and 
a  common  arithmetic  operation. 

3.  Instruction  7  involves  an  ordered  pair  of  vectors 

(a,  p},  a  -  (aj,  (xz,  .  .  .  ,  an),  p  =  {/3j,  /3?, 

....  (3^)  and  a  specified  arithmetic  operation 

lor  each  of  the  couples  (cr,  pj,  i  =  1,  2,  .  .  .  n. 

The  resulting  vector  is  an  n-vector  y  =  (y^,  y^, 

....  V  )  with  elements  specified  in  terms  of  the 
n 

couples  ( a p^),  i  =  1,  2,  .  .  .  ,  n  and  a  common 
arithmetic  operation. 

4.  Instructions  8  and  9  involve  a  specified  arithmetic 
operation  applied  to  the  set  of  elements  ■  *  = 

1,  2.  ...  ,  r.  of  a  vector  a  -  (CTj,  a^,  ...  ,  »n ). 

The  resulting  vector  y  =  (\j)  is  a  1 -vector  whose 
single  element  is  specified  in  terms  of  the  set 
{a^}  ,  i  =  1,  2,  .  .  .  ,  n  and  an  arithmetic  oper¬ 
ation. 

5.  Instruction  10  creates  a  new  vector  with  elements 
specified  by  the  programmer, 

6.  Instruction  11  specifies  a  transfer  of  control,  based 
on  the  results  of  a  test. 

The  above  observations  suggest  that  further  study  of  these  and  other  in¬ 
structions,  yet  to  be  defined,  will  produce  new  insights  into  the  nature 
of  problems,  possible  solution  and  notations  in  terms  of  which 

solution  models  may  be  written.  Experience  has,  in  fact,  already  shown 
this  to  be  true.  The  instructions  discussed  in  this  report  were  the  re¬ 
sults  of  an  effort  to  determine  arithmetic  operations  that  would  be  fre¬ 
quently  encountered  in  machine  computation,  and  instructions  that  would 
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compactly  specify  parallel  execution  of  such  operations.  The  list  of  in¬ 
structions  is  quite  short.  Efforts  to  express  parallel  solution  models  in 
terms  of  these  instructions  can  be  expected  to  produce  changes  in  instruc¬ 
tion  form  and  definition,  suggest  new  instructions,  and  lead  to  the  formu¬ 
lation  of  a  FORTRAN  type  language.  The  execution  on  a  parallc1  processor 
of  programs  written  in  such  a  language  would  require  the  construction  of 
a  compiler  to  translate  instructions  of  the  type  listed  into  an  efficient, 
program  of  micro  instructions  acceptable  to  the  processor. 

6.  CONCLUSIONS 

Attempts  to  write  parallel  solution  models  and  to  express  the  operations 
involved  in  a  compact  notation  have  led  to  the  development  of  a  prelimi¬ 
nary  list  of  macro  instructions  for  a  parallel  processor.  Experience  gained 
in  constructing  parallel  solution  models  and  writing  programs  for  them 
could  provide  a  basis  for  modifying  presently  proposed  instructions  and 
defining  new  ones. 

The  definition  and  modification  of  instructions  is  essentially  an  effort  to 
express  compactly  the  operations  characterizing  a  problem  and  structur¬ 
ing  possible  methods  of  solution.  Hence,  it  is  hoped  that  further  develop¬ 
ment  of  instructions  will  suggest  new  conceptual  modes  in  which  problems 
and  possible  solutions  may  be  analyzed,  and  provide  insights  into  the  na¬ 
ture  and  significance  of  parallelism  within  a  problem  and  methods  for  ex¬ 
ploiting  it  by  new  computational  procedures. 
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1.  INTRODUCTION 

Investigations  into  machine  structure  and  parallel  execution  of  coded 
routines  led  logically  to  the  problem  of  compiling  a  source  program  in 
parallel.  In  other  words,  given  a  sequence  of  statements  written  in,  say, 
MAD  (Michigan  algorithm  decoder)  or  FORTRAN  (IBM  formula  translat¬ 
ing  system),  how  can  a  parallel  processor  be  used  to  compile  the  entire 
set  of  statements  in  parallel''' 

In  this  appendix,  an  algorithm  for  parallel  compilation  is  developed,  a 
method  of  simulation  on  a  sequential  machine  is  described,  and  the  re¬ 
sults  of  a  simulation  for  a  small  set  of  replacements  are  presented.  It 

is  assumed  that  the  language  statements  are  written  in  MAD  and  that  the 

Sl  b 

precedence  hierarchy  is  that  of  Arden,  Galler,  and  Graham.  ’  The  MAD 
language  was  chosen  because  documentation  on  its  structure  is  readily 
available  and  because  techniques  developed  through  MAD  can  be  extended 
to  other  languages. 

2.  PARALLEL  COMPILATION 
a.  General 

During  the  process  of  compilation,  a  sequence  of  statements  written  in  a 
higher  language  such  as  MAD  is  translated  into  a  sequence  of  machine 
language  statements.  The  compilation  process  usually  decomposes  higher 
language  statements  into  a  matrix  form  of  triples  and  then,  from  the 
matrix,  establishes  a  set  of  machine  language  statements.  Included  in 

a  ... 

University  of  Michigan  Computing  Center:  Michigan  Algorithm  Decoder. 

Ann  Arbor,  Michigan,  June  1963. 

Arden,  B.  ;  Galler,  B.  :  and  Graham.  R.  :  ’  An  Algorithm  for  Translating 
Boolean  Expressions,  "  Journal  of  the  ACM,  April  1962;  9:222-239. 
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the  compilation  process  is  the  handling  of  such  considerations  as  dimen¬ 
sion,  mode,  and  storage  allocation. 

The  compilation  algorithm  developed  here  deals  only  with  the  decomposi¬ 
tion  of  higher  language  statements  into  triples.  The  statements  are  re¬ 
stricted  to  replacement  types  involving  nonsubs cripted  variables.  The 
previously  cited  precedence  hierarchy  is  limited  to  the  set  of  operators 
given  in  Table  X-l. 


TABLE  X-l  -  PRECEDENCE  HIERARCHY 


Operator 

Description 

Precedence 

.ABS. 

Absolute  value 

rughest 

.  P. 

Exponentiation 

-u 

Unary  minus 

*.  / 

Multiplication,  division 

+  ,  - 

Plus,  minus 

s 

Equals  (substitution) 

K  *!,(.) 

Begin  statement,  end 
statement,  open  paren¬ 
thesis,  close  paren¬ 
thesis 

Lowest 

It  is  further  assumed  that  the  replacement  statements  are  stored,  symbol 
by  symbol,  in  an  ordered  list.  For  example,  the  MAD  statement 

F  =  A  +  B*.  ABS.  (C  +  D)  (1) 

is  assumed  to  be  stored  in  a  list  as: 


Index,  i 

0 

1 

2 

3 


Item. 

*> 

F 

S 

A 


(2) 
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Index,  i 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 


Item. 

+ 

B 

# 

.ABS 

( 

C 

+ 

D 


) 

4 


(2) 


Later  it  is  shown  that  the  eet  of  triples  corresponding  to  (2)  is  just: 


c 

Triples 

+ 

D 

o 

.ABS. 

R! 

B 

* 

*2 

A 

+ 

R3 

F 

* 

R4 

(3) 


In  (3),  denotes  the  resu::*»*t  from  the  i***  triple  (row).  Now  (3)  is  read, 
row  by  row.  as: 

Rj  a  C  +  D 

R2  «  ABS.  (R^ 

Rj  *  B  +  Rj 

R  *  A  ♦  R, 

4  3 

F  *  R4 

■  A  ♦  B*  ABS,  (C  ♦  D) 

Note  that  the  final  reading  ;s  just  (1). 
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b.  Compilation  Algorithm 

In  parallel  compilation,  one  tries  in  successive  passes  to  examine  simul¬ 
taneously  many  statements  such  as  (1),  stored  in  the  fashion  of  (2),  and 
tc  form  on  each  pass  all  possible  triples  and  simplifications  for  the  en¬ 
tire  set  of  statements. 

An  algorithm  fcr  effecting  parallel  compilation  is  shown  in  Figure  X-l, 

On  each  pass,  the  tests  (operations)  indicated  in  Figure  X-l  are  applied 
to  a  list  such  as  (2).  Sequences  of  items  taken  3,  4,  or  5  at  a  time  are 
s ought  that  mf-t  certain  conditions  (blanks  are  ignored).  If  the  indicated 
conditiors  obtain,  triples  are  formed  and/or  statements  are  simplified 
as  indicated. 

As  the  structure  of  the  flow  chart  in  Figure  X-l  indicates,  the  four  oper¬ 
ations  may  be  executed  concur  ently;  and  the  algorithm  is  capable  of  de¬ 
composing,  in  parallel,  all  the  substitution  statements  of  a  sour  re - 
language  (MAD)  program  into  a  string  of  triples  ready  for  final  assign¬ 
ment  (machine  language).  Several  passes  through  the  loop  may  be  re¬ 
quired,  die  number  depending  on  the  size  and  complexity  of  the  program 
to  be  comp;  . 

The  operations  indicated  in  Figure  X-l  proceed  as  follows: 

1.  Operation  1  looks  for  quadruples  ABCD,  where  A 
is  an  operator;  B  is  either  a  "-u"  or  an  ".  ABS. 

C  is  a  variable;  D  is  an  operator  such  that  P(D)  = 

P(B).,  where  P(X)  denotes  the  precedence  of  X  as 

given  '.n  Table  X-l.  It  is  assumed  that  B  is  the 
th 

a  item  on  the  input  list.  Variable  C  is  removed, 

and  B  is  replaced  by  the  variable  Rq.  A  triple  is 

formed  of  O,  B,  and  C,  and  its  resultant  is  stored 

in  R  . 

a 

2.  Operation  c  looks  for  all  quintuples  ABODE,  where 
A,  C,  and  E  are  operators;  B  and  D  are  variables; 
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and  P(A)  <  P(C)  ^  P(E).  It  is  assumed  ^at  C  is 
the  /3**1  item  on  the  input  list.  Variable-  B  and  D 
are  removed,  C  is  replaced  by  R^,  and  a  triple 
is  formed  of  5,  C  and  D,  with  a  resultant  R^. 

3.  Operation  3  removes  the  parentheses  surround¬ 
ing  single  variables. 

4.  Operation  4  removes  all  sequences  t-A*<  ,  where 
A  is  a  variable. 

Subsequent  to  the  execution  of  these  four  operations,  control  returns  to 
Operation  1  if  the  input  list  is  not  empty.  Otherwise  there  is  an  exit. 

In  seven  successive  passes  noted  under  Items  £  through  i_,  below,  the 
compilation  algorithm  is  applied  to  the  statement  (1),  as  stored  in  the 
list  (2),  and  the  set  of  triples  is  developed.  In  each  of  these  passes,  the 
procedure  is  to  work  through  the  scheme  detailed  in  Figure  X-l. 

£.  Pass  1 

For  Operation  1,  the  quadruples  y,  a,  V,  /3  do  not  exist,  so  that  P(o)  ^ 
P(j9:,  a*[.ABS.  ,  -uj. 

For  Operation  2,  the  quintuple  (C  +  D)  fulfills  the  requirements  of  a,  V, 
W,  y,  where  P(a)  <  P(/3)  -  F(-y).  Hence,  +  is  replaced  by  Ri0,  where 
Rjq  is  the  triple  C  +  D.  Now  C  and  D  are  removed  from  the  list. 

For  Operation  3,  there  exists  no  triple  (,  V,  ). 

For  Operation  4,  there  exists  no  triple  V,  -4  . 

After  Pass  1,  the  list  (2)  reads  as  follrws  (where  A  denotes  a  blank): 

Index,  i  Item. 

- —  - 1 

0  ► 

1  F 

2  = 

3  A 
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Index,  i 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 


Iterm 


+ 

B 

* 


.  ABS. 
{ 

A 

R10 

A 

) 


d.  Pass  2 

Only  the  condition  specified  by  Operation  3,  obtains,  namely  (Rjq)-  The 
parentheses  are  removed  from  the  list.  After  Pass  2,  the  list  (2)  reads: 


Index,  i 

0 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 


Item. 

h 

F 


A 

+ 

B 

* 


.ABS. 


-» 
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e.  Pas 8  3 

Only  the  condition  specified  by  Operation  1  obtains  for  *,  .  ABS.  ,  Rjg.  **  • 
Hence,  .ABS.  is  replaced  by  R^,  where  R^  is  the  triple  O.ABS.Rjg. 

Now,  Rjq  is  removed  from  the  list.  After  Pass  3,  the  list  (i)  reads: 

Index,  i 

0 
1 
2 

3 

4 

5 

6 

7 

8 
9 

10 

11 
12 
13 

f.  Pass  4 

Only  the  condition  specified  by  Operation  2  obtains  for  +,  B,  *,  R^,  H  . 
Hence,  *  is  replaced  by  R^,  where  R^  is  the  triple  B*R^.  Now  B  and  R^ 
are  removed  from  the  list.  Alter  Pass  4,  the  list  (1)  reads: 

Index,  i  Iterm 

0  i- 

1  F 

2 

3  A 


Iterm 

t* 

F 

A 

+ 

B 

# 

R7 

A 

A 

A 

A 

A 
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Index,  i 

Item. 

4 

+ 

5 

A 

6 

R6 

7 

A 

8 

A 

9 

A 

10 

A 

11 

A 

12 

A 

13 

-t 

g.  Pass  5 

Only  the  condition  specified  by  Operation  2  obtains  for  =,  A,  +,  R^,  ■»  . 
Hence,  +  is  replaced  by  R^,  wh  re  is  the  triple  A  +  R, .  Now  A  and 
R^  are  removed  from  the  list.  After  the  Pass  5,  the  list  (1)  reads: 

Index,  i 

0 
1 
2 

3 

4 

5 

6 

7 

8 
9 

10 
11 
12 
13 


Item. 

*■ 

F 

A 

R4 

A 

A 

A 

A 

A 

A 

A 

A 
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r 

i 

i 

i 

j 


h.  Pass  6 

Only  the  condition  specified  by  Operation  2  obtains  for  V,  F,  =,  R^,  H  . 
Hence,  =  is  replaced  by  R2,  where  is  the  triple  F  =  R^.  Now  F  and 

R.  are  removed  from  the  list.  After  Pass  6,  the  list  (1)  reads: 

4 


Index,  i 

0 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 


lterm 

H 

A 

R2 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 


s 

1 


c" 


i.  Pass  7 

Only  the  condition  specified  by  Operation  obtains  for  K  R^,  **  .  The 
list  is  emptied.  By  now  the  compilation  scheme  has  generated  the  follow¬ 
ing  triples  in  the  order  indicated: 


Index 

Triples 

R10 

C 

+ 

D 

R7 

O 

.  ABS. 

R10 

R6 

B 

* 

R„ 

i 

R4 

A 

V 

\ 

*2 

F 

- 

*4 
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This  is  effectively  (3)  and  is  read  as: 

F  -  R4 
=  A  + 

=  A  J- 
=  A  + 

=  A  + 

Conclusion 

In  this  example,  only  one  replacement  statement  was  compiled.  The  compi 
lation  scheme  is  intended  to  compile  many  replacement  statements  simul¬ 
taneously,  and  the  relative  speed  advantage  of  parallel  over  sequential 
compilation  increases,  within  machine  capacity,  with  the  number  (and 
complexity)  of  statements  to  be  compiled. 

3 .  SIMULATION  MODEL  AND  RESULTS 

The  compiler  algorithm  developed  under  Item  2  is  designed  to  be  imple¬ 
mented  on  a  parallel  processor.  Since  no  such  machine  is  available  for 
checking  out  the  algorithm  on  a  sample  problem,  simulation  of  parallel 
compilation  must  be  effected  on  a  sequential  machine.  To  effect  the  simu¬ 
lation,  code  PARCOM  (parallel  compilation)  was  written  in  FORTRAN  IV 
for  the  IBM  1410.  Code  PARCOM  executes  the  compilation  algorithm  in 
an  effectively  parallel  fashion  on  a  given  set  of  replacement  statements. 

Code  PARCOM  operates  as  follows: 

1.  A  sequence  oi  replacement  statements,  such  as 
(1),  is  read  into  the  machine. 

2.  The  symbols  comprising  the  statements  are  ex¬ 
amined  and  classified  as  variables,  operators,  or 
blanks;  and  precedences  are  assigned  to  the  oper¬ 
ators. 


R6 

b*r7 

B+.  ABS.R10 
B*.  ABS.  (C  +  D) 


-245- 


APPENDIX  'X 


3. '  Tests  specified  by  operations  1,  2,  3,  and  4  in  Fig¬ 

ure  X-l  are  applied,  in  an  effectively  parallel  fash¬ 
ion,  to  each  of  the  replacement  statements. 

4.  Based  on  the  results  of  the  tests,  triples  are  formed 
and/or  statement  simplifications  are  made. 

5  Steps  3  and  4  are  repeated  as  necessary. 

After  each  pass  through  the  set  of  statements,  code  PARCOM  prints  out 
the  triples  formed  during  the  pass  and  the  resultant  set  of  statements 

To  check  the  algorithm,  the  following  set  of  seven  replacement  state¬ 
ments  v/as  selected: 

Statement 

F  =  D  +  X*[c  +  X*(B  +  A*X)J  +  R.  P,  ABS.  -  S 
G  =  -B  +  (B*B  -  4*A*C).  P.  2  /(2*A) 

H  =  (A  +  B).  P.  [c*<D  +  E*P/.  ABS.  X)] 

I  =  A  +  B  +  C  -  D*E  *4) 

U  =  F  +  G 
V  r  H*I 

W  =  -.  ABS  (U.  P.  V) 

The  results  of  applying  the  compilation  algorithm  to  the  set  (4)  are  pre¬ 
sented  in  Tables  X-2  through  X-13,  These  tables  show  (4)  initially  in  the 
form  of  the  list  (2)  and  the  results  of  successive  passes.  Table  X-14 
shows  the  entire  list  of  triples  generated. 

As  an  example  of  how  the  triples  represent  the  replacement  statements, 
selected  from  Table  X-14  are  those  triples  into  which  Statement  1  v/as  de¬ 
composed,  namely: 


Number 

1 

2 

3 

4 

5 

6 
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Index 

1 

2 

15 

16 
22 
27 
30 
35 
37 
40 
42 

These  combine  as 
F  =  R 

=  R37  +  ^22 
=  D  +  R.  P.  Rj^ 

=  D  +  X*R3C  4  R.  P.  .  ABS.  R2 

=  D  +  X*(C  +  R27)  +  R.  P.  •  ABS.  -  S 

=  D  +  XWC  +  X*R15)  +  R.  P.  .ABS.  -  S 

=  D  +  X*jc  +  X*(B  +  Rj)J  4  R.  p.  .  ABS.  -  S 

=  D  +  X*[c  +  X*(B  +  A*X)]  +  R.  P.  -  ABS.  -  S 

The  last  statement  is  Statement  1  from  the  set  (4). 

The  triples  generated  on  each  pa. -is  correspond  to  basic  arithmetic  oper¬ 
ations  that  can  be  performed  at  the  'i me  of  the  pass.  Hence,  the  compila¬ 
tion  algorithm  general  es  triples  suitable  for  parallel  execution  and  pro¬ 
vides  a  first  approach  to  the  recognition  of  low  level  parallelism  within  a 
source  program. 


Triple 


A 

* 

X 

O 

- 

s 

B 

4 

R1 

0 

.  ABS. 

R2 

R 

.  P. 

R16 

X 

# 

R1 5 

C 

4 

R27 

X 

* 

R30 

D 

4 

R35 

R37 

4 

*>2 

F 

= 

R40 
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TABLE  X-7  -  RESULTS  AFTER  PASS  ! 


Index 

Triple* 

Contributing 

•utement 

30 

C 

♦ 

*27 

1 

31 

*23 

.p. 

2 

2 

32 

c 

* 

*24 

3 

33 

w 

• 

*29 

7 

Reeulting  eutoment* 


Index 

1 

2 

3 

4 

5 

6 

7 

1 

> 

P 

► 

A 

A 

A 

P 

2 

F 

Q 

H 

A 

A 

A 

A 

3 

• 

■ 

■ 

A 

A 

A 

*33 

4 

D 

( 

A 

A 

A 

A 

A 

S 

♦ 

*3 

A 

A 

A 

A 

A 

6 

X 

A 

*7 

A 

A 

A 

A 

7 

* 

+ 

A 

A 

A 

A 

A 

8 

( 

A 

A 

A 

A 

9 

A 

A 

.P. 

A 

A 

10 

*30 

A 

( 

A 

• 

A 

11 

A 

A 

A 

A 

4 

12 

A 

A 

*32 

A 

13 

A 

A 

A 

A 

14 

A 

A 

A 

1» 

A 

A 

A 

16 

A 

A 

A 

17 

A 

A 

6* 

If 

A 

A 

A 

19 

A 

*31 

A 

20 

) 

A 

A 

21 

♦ 

) 

A 

22 

A 

/ 

A 

23 

*12 

A 

) 

24 

A 

A 

4 

23 

A 

*6 

26 

A 

A 

27 

4 

A 

28 

4 

NOTE:  A  iuoui  Uuk. 
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TABLE  X-8  -  RESULTS  AFTER  PASS  6 


Index 

Triples 

Contributing 

statement 

34 

R3 

+ 

R31 

2 

Resulting  statements 


Index 

1 

2 

3 

4 

5 

6 

7 

1 

* 

A 

A 

A 

A 

2 

F 

G 

H 

A 

A 

A 

A 

3 

= 

= 

A 

A 

A 

A 

4 

D 

( 

A 

A 

A 

A 

A 

5 

+ 

A 

A 

A 

A 

A 

A 

6 

X 

A 

R7 

A 

A 

A 

A 

7 

* 

R34 

A 

A 

A 

A 

8 

A 

A  j 

A 

A 

A 

9 

A 

A 

.  P. 

A 

A 

10 

R30 

*  l 

A 

A 

A 

11 

A 

A 

A 

A 

i 

j 

A 

12 

A 

A 

R32 

A 

13 

A 

A  1 

A 

A 

14 

A 

A 

A 

15 

A 

A 

A 

16 

A 

A 

A 

1 

! 

17 

A 

A 

A 

1 

18 

A 

A 

A 

1 

19 

A 

A 

A 

20 

A 

A 

A 

21 

) 

A 

U 

A 

/ 

A 

1 

23 

R2’ 

A 

A 

| 

1 

24 

A 

•4 

1 

25 

A 

R6 

26 

£ 

A 

27 

* 

A 

28 

_j 

! _ 

_ 

NOTE:  A  denotes  blank  . 
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NOTE:  &  4*eot«*  bUftk 
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The  compilation  algorithm  may  generate  triples  involving  variables  not 
available  at  the  time  the  triples  are  formed.  For  example,  Pass  1  gen¬ 
erated  these  triples: 

Index  _ Triple _ 

12  F  +  G 

13  H  *  I 

14  U  .  P.  V 

Now  the  revolt**  t  for  Triple  12Q3]  cannot  be  calculated  until  Statements 

**•  <■* 

1  and  2  [3  and  4j  a.*e  executed.  Similarly,  the  resultant  for  Triple  14 
carnot  be  calculated  until  Statements  5  and  6  are  executed. 

This  "premature"  generation  ox  triples  should  pose  no  problem  in  the 
parallel  processor,  since  Machine  I  (Appendix  III)  provides  a  "compute 
on  availability"  option.  That  is,  if  the  quantity  A  +  B  is  to  be  computed, 
the  machine  will  delay  the  computation  until  such  time  as  A  and  B  are 
available. 
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1.  INTRODUCTION 

The  notion  of  parallel  compilation  is  discussed  and  an  algorithm  for  ef¬ 
fecting  it  is  presented  in  Appendix  X.  Subsequent  attempts  to  implement 
this  algorithm  on  a  parallel  processor  (Appendices  VI  and  XV)  revealed 
the  desirability  of  modifying  it  because  implementation  of  the  algorithm 
in  its  present  form  may  require  the  initiation  of  an  excessive  number  of 
parallel  processor  "tasks"  (Appendixes  XIV  and  XV)  and  lead  to  extremely 
cumbersome  control  programs. 

In  this  appendix,  three  modifications  of  the  parallel  compilation  algorithm 

1  5t 

are  suggested.  The  first  involves  the  translation  of  MAD  ’  statements 

2 

into  reverse  Polish  notation  and  testing  for  triple  formation  in  parallel 
with  input  operations.  In  the  second,  the  tests  for  triple  formation  are 
selectively  applied  a  >  compilation  progresses.  In  the  third  the  form  of 
the  algorithm  is  changed. 

Throughout  this  appendix,  the  class  of  MAD  statements  considered  is  re¬ 
stricted  to  replacement  type  statements  involving  nonsubs cripted  variables. 

For  a  review  of  parallel  compilation,  see  Appendix  X. 

2.  PROBLEMS  OF  IMPLEMENTATION 

The  parallel  compilation  algorithm  described  in  Appendix  X  specifies  a 
sequence  of  passes  in  which  the  concurrent  application  of  a  set  of  tests  to 
a  list  of  replacement  statements  written  in  the  MAD  language  results  in 
all  possible  triple  formations  and/or  statement  simplifications.  The 
algorithm,  as  presented  in  Figure  X - 1  of  Appendix  X.  suggests  that  ap¬ 
plication  of  all  the  tests  to  all  possible  sequences  of  contiguous  items 

Superior  numbers  ir  th<*  text  refer  to  items  in  the  List  of  References,  Item  5, 
Page  283. 
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taken  3,  4,  and  5  at  a  time.  Investigation,  however,  reveals  that  in  a 
computer  implementation  of  the  algorithm  such  a  procedure  would  be 
wasteful  of  machine  capacity.  As  a  case  in  point,  consider  the  expres¬ 
sion 

F  =  A  +  B*  .  ABS.  (C  4  D)  .  (1) 

On  the  first  pass  over  (1),  the  algorithm  would  combine  C,  4-,  D  into  a 
triple,  say  Rj.  Clearly,  on  the  next  pass  it  is  futile  to  look  at  items  pre¬ 
ceding  the  sequence  {,  Rl,)  in  the  hope  of  obtaining  triple  formations  and/¬ 
or  statement  simplifications.  Hence,  to  assign  machine  capacity  to  such 
testing  is  wasteful. 

While  this  potentially  wasteful  testing  is  easily  recognized,  its  remedy 
is  not.  Just  how  one  might  program  a  parallel  processor  to  test  those 
(and  only  those!)  contiguous  sequences  of  items  where  the  specified  con¬ 
ditions  may  obtain  is  a  problem  dealt  with  in  Item  3,  below. 

Yet  another  problem  incident  to  the  implementation  of  the  algorithm  sterns 
from  the  fact  that  the  several  tests,  being  of  different  complexity,  require 
different  amounts  of  time  for  machine  execution. 

This  disparity  of  execution  time  leads  to  some  rather  severe  programming 
problems  associated  with  triple  formation  and/or  statement  simplification 
and  maintenance  of  a  valid  item  list.  These  programming  problems  should 
not  be  attributed  entirely  to  the  form  of  the  parallel  compilation  algorithm; 
much  of  the  difficulty  of  program  construction  is  due  to  the  fact  that  both 
the  parallel  processor  and  its  associated  programming  language  are  radi¬ 
cally  new  and  there  exists  but  little  experience  on  waich  to  draw  in  the 
construction  of  programs. 

As  a  means  of  obviating  the  problems  mentioned  above,  the  following  pos¬ 
sibilities  were  considered:  preliminary  translation  of  input  sta>  tments 
to  reverse  Polish  notation,  and  innovations  in  tne  utilization  of  the  parallel 
processor  programming  language.  Both  possibilities  were  investigated 
with  fruitful  results.  An  unexpected  result  of  the  investigation  was  the 
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development  of  a  completely  new  form  of  the  compilation  algorithm.  The 
results  are  detailed  in  Item  3. 

3.  SUGGESTED  MODIFICATIONS 

a.  General 

In  this  section,  two  methods  of  implementing  the  parallel  compila¬ 
tion  algorithm  on  the  parallel  processor  are  discussed  The  first 
method  involves  the  conversion  of  MAD  statements  into  reverse 
Polish  notation  (RPN);  the  second  involves  programming  innovations. 
Subsequently,  a  restructuring  of  the  parallel  compilation  algorithm 
that  offers  greatly  increased  speed  of  execution  is  presented. 

b.  Reverse  Polish  Notation  (RPN) 

Polish  notation  refers  to  a  method  of  representing  logical  formulas 

3 

developed  by  a  Polish  mathematician,  J.  Lukasiewicz.  The  nota¬ 
tion  provides  an  unambiguous  sequential  specification  of  the  order 
of  execution  of  logical  and  arithmetic  operations  without  the  use  of 
parentheses.  The  particular  form  of  Polish  notation  used  here  is 
RPN.  In  RPN,  operators  are  written  after  the  operands  on  which 
they  operate.  A  necessary  condition  for  the  use  of  RPN  is  that  each 
operator  be  associated  with  a  definite  number  of  operands.  Far  ex¬ 
ample,  A  +  B  would  be  written  in  RPN  as  AB+  where  +  would  be  un¬ 
derstood  to  operate  on  the  two  operands  immediately  preceding  it 
(A  and  B).  Operands  may  in  fact  be  the  results  of  operations.  For 
example,  the  expression  A+B*C  is  written  in  RPN  as  ABC*+  where 
*  operates  on  B  and  C,  giving  B*C,  and  the  +  operates  on  A  and  BC*. 
giving  A+B*C.  The  ability  of  RPN  to  obviate  the  use  of  parentheses 
is  seen  by  considering  the  expression  (K  +  B)*C,  which  would  be 
written  as  AB  +  C*. 

Implementation  of  the  parallel  compilation  algorithm  may  be  facili¬ 
tated  by  first  translating  MAD  statements  into  RPN  and  then  con¬ 
structing  triples  from  the  RPN  representation.  The  MAD  statement  (1) 
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t 

E 


F  =  A  +  B*  .  ABS.  (C  +  D)  , 
would  appear  in  RPN  as 

FAB  CD  +  .  ABS.  *  +  =  (2) 

which,  for  purposes  of  compilation,  would  be  interpreted  as 


R1 

CD+ 

C  +  D 

R2 

Rj  .ABS. 

0  .  ABS.  R2 

R3 

br2* 

b*r2 

R4 

ar3+ 

a  +  r3 

R5 

fr4= 

F  =  R4 

which  is  read  as 


F  =  R 


4 

=  A  +  R. 


=  A  +  B*R„ 


C  ‘ 


*  A  +  B*  .  ABS.  R 


=  A  +  B*  .  ABS.  -(C  +  D) 


which  is  just  (1), 


Figure  XI-1  gives  a  flow  chart  that  specifies  a  method  of  translating 
MAD  statements  into  RPN.  The  method  depends  upon  the  specifica¬ 
tion  of  precedences  for  operators  and  the  use  of  a  stack.  Table  XI-1 
gives  the  required  precedences.  Note  that  these  precedences  differ 
slightly  from  those  of  Table  I  in  Appendix  X  in  that  it  includes  oper¬ 
ators  allowing  the  translation  of  statements  involving  logical  oper¬ 
ations 
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LET  DENOTE  THE  INPUT  STRING  (LEPT  TO  RIGHT) 

1  2  l 

b  ,  b  .  •  •  •  ,  b.,  •  •  •  DENOTE  THE  OUTPUT  STRING  (LETT  TO  RIGHT) 
1  *  J 

V  *2*  *  *  '  ’  V  *  ’  *  DEN0TE  ™E  STACK  (BOTTOM  TO  TOP) 


START 


P(X)  <  P ( V)  MEANS  OPERATOR  V  HAS  HIGHER  PRECEDENCE  THAN  OPERATOR  X 
PI  I )  SHOULD  BE  LESS  THAN  ANY  OPERATOR 
P  (UNARY)  >  P  (BINARY) 

NOTE: 

WORKS  P  OR  BOTH  UNARY  A  HO  BINARY  OPERATORS  AS  LONO  AS  THEY  ARE 
UNK3w,e  <U,UNARY  -  IS  DISTINCT  PROM  BINARY  -  IN  INPUT). 


Figure  XI*  1  *  Translation  from  MAD  to  Reverse  Polish  Notation 
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TABLE  XM  -  PRECEDENCE  HIERARCHY  FOR 
RPN  TRANSLATION 


Operator 

Description 

Precedence 

•  ABS. ,  -u 

Absolute  value,  unary 

Highest 

minus 

.  P. 

Exponentiation 

*.  / 

Multiplication,  division 

+  ,  - 

Addition,  subtraction 

<  <  >  >  J 

»  »  •  »  T  * 

Relations  (with  usual 

= 

interpretations) 

r 

Not 

A 

And 

V 

Or 

:  = 

Equals  (substitution) 

K  -1.  (.  ) 

Begin  statement,  and 

Lowest 

1 

statement,  open  paren¬ 
thesis,  close  parenthesis 

In  the  accompanying  discussion  only  :=  occurs  and  is 
written  as  -■ 


The  translation  proceeds  basically  as  follows.  Items  are  taken,  left 
to  right,  from  the  input  string.  If  an  item  is  a  variable,  it  is  added, 
in  a  left-to-right  fashion,  to  the  output  string.  If  an  item  is  an  ope.  - 
ator,  its  precedence  is  compared  with  the  precedence  of  the  topmost 
item  n  the  stack.  If  the  precedence  of  the  input  item  is  greater  than 
or  equal  to  the  precedence  of  th*  stack  .ttm,  the  input  item  is  added 
to  the  top  of  the  stack;  otherwise,  the  input  item  "sinks"  down  the 
stack  until  it  encounters  a  stack  item  whose  precedence  is  no  greater 
than  that  of  the  input  item.  As  an  input  item  sinks  in  the  stack,  those 
stack  items  whose  precedences  exceed  that  of  the  input  item  are  re* 
moved  fro^  -  *op  of  the  stack  and  added  to  the  output  string. 
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Parentheses  require  special  comment.  An  open  parenthesis,  when 
encountered,  is  placed  on  the  top  o£  the  stack.  A  close  parenthesis, 
when  encountered,  sinks  in  the  stack  until  a  stacked  open-parenthesis 
is  encountered  at  which  point  both  are  removed  from  further  consid¬ 
eration.  Those  stack  items  past  which  a  close  parenthesis  sinks  are 
added  to  the  output  string. 

Certain  comments  are  also  in  order  regarding  relative  precedences 
of  unary  and  binary  operators  involved  in  the  translation  process.  It 
is  necessary,  first  of  all,  for  unary  operators  to  have  higher  prece¬ 
dence  than  binary  operators.  Consider  the  following  cases.  The 
string  a.  P.  b.  P.  c  may  be  interpreted  either  as  a  or  ab  ,  the  choice 
being  determined  by  the  direction  of  scan  (left  to  right  or  right  to  left) 
of  the  input  string  and  the  placement  of  the  equality  sign  on  the  paths 
out  of  the  precedence  test  box  for  operators  (see  Figure  XI- 1).  But 
the  string  a  +  B*  .  ABS.  C  admits  of  only  one  interpretation,  namely 
a  +  (b*(  .ABS.  (c))).  With  the  precedence  of  unary  operators  greater 
than  those  of  binary  operators,  the  expression  a  +  b*  .  ABS.  C  would 
translate,  correctly,  into  RPN  as  abc  .ABS.  ♦+.  However,  if  the 
unary  operation  had  lower  precedence,  the  translation  would  produce, 
incorrectly,  abc*  +  .ABS.  which  implies  .ABS.  (a  ♦  b*c). 

Further,  all  unary  operators  should  be  of  equal  precedence-  Suppose 
one  had  to  translate  an  input  string  .  .  .  BjUjU^  •  •  •  U^VB^  .... 
where  the  B's  represent  binary  operators,  the  U's  unary  operators, 
and  V  a  variable.  (Many  unary  operators  are  possible;  for  example. 
-u>  .ABS. .  sin,  log,  etc.)  Correct  translation  requires  that  the  out¬ 
put  string  appear  in  RPN  as  .  .  .  VUnUn^  .  .  .  Uj  .  .  .  which,  in 
turn,  requires  that  the  unary  operators  have  equal  precedence. 

The  process  of  parallel  compilation  utilises  RPN  in  the  following 
fashion.  As  MAD  statements  are  read  into  the  parallel  processor, 
they  are  converted  into  RPN  As  soon  as  an  operator  is  inserted  into 
the  RPN  string,  the  parallel  processor  begins  the  formation  of  the 
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corresponding  triple  while  continuing  the  read-in  process.  This 
procedure  allows  compilation  to  proceed  in  parallel  jvith  input. 

The  use  of  RPN  in  the  compilation  procedure,  as  outlined  above, 
greatly  simplifies  the  task  of  the  programmer  in  implementing  the 
compilation  algorithm  on  a  parallel  processor.  However,  the  use 
of  RPN  in  the  implementation  of  the  compilation  algorithm  seems 
unavoidably  to  result  in  a  parallel  processor  machine  program  that 
is  rather  slow  when  applied  to  a  single  statement.  The  slowness  of 
a  program  using  RPN  is  due  mainly  to  the  sequential  nature  of  the 
translation  process  and  the  well-known  inefficiency  accompanying 
the  constraint  of  the  parallel  processor  to  a  single  sequential  mode 
of  operation.  Although  the  effects  of  this  can  be  mitigated  by  the 
concurrent  compilation  of  many  statements  on  the  parallel  proces¬ 
sor,  further  efforts  were  made  to  discover  a  simple,  easily  pro¬ 
grammed  method  of  implementing  the  compiler  algorithm  that  would 
be  free  of  slowness  due  to  inherent  sequential  characteristics.  Some 
results  of  these  efforts  follow. 

c.  Programming  Innovations 

As  pointed  out  in  Item  2  above,  initial  attempts  to  implement  the 
parallel  compilation  algorithm  met  with  severe  programming  prob¬ 
lems  which  in  turn  led  to  the  use  of  RPN  in  the  compilation  process. 

But  the  use  of  RPN  was  not.  wholly  satisfactory.  Hence,  renewed 
effort*  were  made  to  construct  an  efficient  compiler  program  free 
of  the  sequential  limitations  inherent  in  the  use  of  RPN.  The  result 
was  an  implementation  whose  flow  diagram  appears  in  Figure  XI -2. 

The  implementation  of  the  compilation  algorithm  as  given  in  this  il¬ 
lustration  proceeds  as  follows.  Reading  left  to  right,  the  item  in 
a  MAD  replacement  statement  is  denoted  by  L^;  with  each  Lt  is  associ¬ 
ated  a  number,  K^.  which  may  be  1.  2  or  3  To  each  L.  is  assigned 
a  parallel  processor  task  (the  tasks  proceed  in  parallel)  that  deter¬ 
mines  whether  or  not  may  be  used  in  triple  formation  or  statement 
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simplification,  tasks  assigned  to  variables  or  resultants  of  triples 
are  suspended;  tasks  assigned  to  operators  form  triples  or  state¬ 
ment  simplifications  if  test  conditions  are  met.  If  test  conditions 
fail  to  be  met,  the  task  detects  whether  failure  is  due  to  inappropri¬ 
ate  items  on  the  left  side,  the  right  side,  or  both  sides.  This  infor¬ 
mation  is  stored  in  and  used  to  re-initiate  testing  whenever 
changes  are  made  in  items  immediately  to  the  left,  right,  or  both 
sides,  K.  =  1,2,  or  3  indicates  blocking  on  the  left,  right,  cr  both, 
respectively. 

NOTE:  In  Figure  XI-2,  the  precedences  of  Table  XI-1  are  used. 

L.±  .  denotes  the  item  j  places  to  the  right  (+)  or  left  (-)  of  L^;  tlr.  t 
is,  "blanks"  generated  by  triple  formation  and/or  statement  simpli¬ 
fication  are  ignored. 

The  implementation  of  the  parallel  compiler  algorithm  specified  by 
Figure  XI-2  offers  a  solution  to  a  problem  posed  in  Item  2,  namely: 
Hew  does  one  program  a  parallel  processor  to  test  those  and  only 
those  contiguous  sequences  of  items  whure  specified  test  conditions 
may  obtain.  .This,  of  course,  allows  economical  use  of  machine  ca¬ 
pacity  in  that  all.  futile  testing  is  avoided  and  tas,cs  are  assigned  only 
to  those  items  where  triple  formation  and/or  statement  simplification 
is  occuring.  Further,  the  control  features  given  in  Figure  XI-2  (for 
example  the  "wait  for  both"  boxes)  allow  the  maintenance  of  a  valid 
and  ur  ambiguous  item  list  throughout  compilation. 

Although  the  method  of  Figure  XI-2  appears  to  offer  an  efficient, 
easily  programmed  implementation  of  the  parallel  compilation  alog- 
rithm,  preliminary  timing  estimates  indicate  that  it  is  not  significantly 
faster  than  the  RPN  method. 

d.  Restructuring  of  the  Algorithm 

An  examination  of  the  compiler  algorithm  reviewed  in  Appendix  X  re¬ 
veals  that  total  parallelism  of  compilation  has  not  been  achieved.  The 
reason  is  as  follows:  Although  the  algorithm  specifies  on  each  pass 
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the  concurrent  execution  of  all  possible  triple  formations  and/or 
statement  simplifications,  the  procedure  is  limited  in  that  it  exam¬ 
ines  only  contifuous  sequences  of  items  taken  3:  4,  or  5  at  a  time. 
Hence,  several  sequential  passes  of  the  algorithm  are  necessary  to 
complete  compilation.  This  limitation  is  also  present  in  the  modifi¬ 
cations  described  above. 


The  question  arises  as  to  whether  or  not  an  algorithm  can  be  devel¬ 
oped  that  will  concurrently  examine  each  item  of  a  list  in  terms  of 
*U  other  items  with  which  it  may  ultimately  be  associated  in  the  com 
pilation  process,  and  specify  triple  formation  and/or  statement  sim¬ 
plification  in  a  fashion  that  achieves  optimal  compilation  speeds. 

This  question  is  now  considered. 

Consider  the  MAD  statement. 


Z  «  (A  +  B)*C  +  D*((E*F  +  G)*H  +  I)  , 


and  the  precedence  assignment 


Symbol  Precedence 


Variable 

* 

+ 


3  +  4N 
2  +  4N 
1  +  4N 
0  +  4N 


(3) 


(4) 


where  N  denotes  the  number  of  parentheses  sets  enclosing  the  sym¬ 
bol. 

Using  the  effective  precedence  due  to  parentheses  inclusion,  (3)  can 
be  written  in  a  list  as 


Symbol 

Precedence 

Symbol 

Precedence 

Z 

3 

+ 

5 

« 

0 

B 

7 

A 

7 

* 

2 
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Symbol 

C 

+ 

D 

* 

E 

* 

F 


Precedence 

3 

1 

3 

2 

11 

10 

11 


Symbol 

+ 

G 

* 

H 

+ 

I 


Precedence 


9 

11 

6 

7 

5 

7 


(5) 


The  list  (5)  may  be  interpreted  in  graphical  form  as  shown  in  Figure 
XI-3.  It  will  be  noted  in  this  graph  how  precedence  modification,  due 
to  parentheses  inclusion,  separates  groups  of  symbols  on  the  basis  of 
parenthetical  grouping  and  obviates  the  need  of  further  retention  of 


parentheses.  The  beginning  and  end  points  of  the  statement  are  arbi¬ 
trarily  assigned  a  precedence  of  -cd. 
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In  the  process  of  compilation,  each  triple  is  formed  from  two  ordered 
variables  (or  resultants)  and  a  binary  operator  (one  variable  or  result¬ 
ant  and  a  unary  operator).  Consider  a  single  variable  that  is  preceded, 
and  followed,  by  binary  operators.  In  triple  formation,  this  variable 
will  be  included  in  the  triple  corresponding  to  whichever  of  the  two 
binary  operators  is  of  higher  precedence.  For  example,  from  the 
statement  (3)  select  (A  +  B)*C  which  is  stored  in  the  list  (5)  as 

Symbol  Precedence 

A  7 

+  5 

B  7 

*  2 

C  3 


Select  the  variable  B  that  is  preceded  by  +  and  followed  by  *.  Because 
of  parentheses  inclusion,  the  effective  precedence  of  +  is  5  which  is 
greater  than  2,  the  precedence  of  *,  and  thus  B  is  to  be  used  in  the 
triple  corresponding  to  +.  B  will  be  used  in  the  right  side  of  the  triple 
corresponding  to  +  since  +  is  on  the  left  side  of  B. 

As  shown  in  Figure  XI-3,  B  is  at  a  peak  on  the  graph  as  are  all  vari¬ 
ables.  To  find  the  operator  in  whose  triple  a  variable  will  be  in¬ 
cluded  is  quite  simple.  One  simply  "looks  down  the  slopes"  to  find 
the  "nearest"  (numerically  greatest)  operator,  with  which  the  vari¬ 
able  is  then  associated.  In  the  event  that  a  variable  has  operators  of 
equal  precedence  on  either  side,  the  right  operator  will  be  considered 
as  having  greater  precedence.  This  convention  conforms  to  the  re¬ 
quired  interpretation  of  precedence#  in  a  concatenation  of  unary  oper¬ 
ators. 

Now,  since  each  operator  will  generate  a  triple,  consideration  must 
be  given  to  ths  placement  of  the  corresponding  resultant  in  other 
triples.  An  examination  of  Figure  XI-3  will  quickly  suggest  how 
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resultants  can  be  combined  into  triples.  Each  operator  corresponds 
to  a  "valley"  (perhaps  "plateau"  in  the  case  of  concatenated  unary  op¬ 
erators).  But  each  operator  also  represents  a  triple  and  correspond¬ 
ing  resultant  that  must  be  treated  as  a  variable.  Hence,  for  each 
operator  one  must  search  the  graph,  both  to  the  right  and  the  left, 
until  on  each  side  an  operator  of  lesser  precedence  is  encountered. 

Thf  resultant  is  then  associated  with  the  triple  corresponding  the  op¬ 
erator  of  higher  precedence  (rightmost  operator  in  the  case  of  equality). 

Basically,  then,  triple  formation  proceeds  in  a  leveling  process  that 
consists  of  combining  variable  "peaks"  of  a  graph  such  as  that  given 
in  Figure  XI- 3  into  "valley"  triples  whose  resultants  are  then  treated 
as  variables  and  the  process  iterated.  Parallelism  is  injected  into 
the  procedure  by  concurrently  associating  each  variable  and  resultant 
with  the  triple  in  which  it  will  ultimately  be  located. 

The  method  of  compiling  MAD  statements  into  triples  outlined  above 
involves  searches  for  items  of  lesser  precedence  both  to  the  right  and 
left  of  a  given  item  from  a  list  such  as  (5).  These  searches  can  be 
accomplished  more  easily  if  such  a  list  is  stored  as  a  part  of  an  ex¬ 
panded  list  defined  as  follows: 

1.  Let  n  symbols,  such  as  those  in  the  list  (5),  be 
indexed  1,  2,  ....  n 

2.  Let  p  be  the  integer  such  that  2**  *  <  n  +  2  ^ 

2P.  Construct  a  list  of  items  indexed 

2.  3,  4 . 2P,  2P  +  1,  2P  +  2 . 

cP  +  n,  2P  +  n  +  1  (6) 

where  n  denotes  the  number  of  symbols  from  a 
list  such  as  (5)  and  2P  +  i  denotes  the  index  for 
symbol  i  from  a  list  such  as  (5).  The  other  in¬ 
dices  of  (6)  denote  dummy  items  that  will  be 
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assigned  a  precedence  defined  below.  These 
dummy  items  will  be  of  value  in  treeing  the 
searches  for  items  to  the  left  or  right  of  a  given 
item  that  are  of  lesser  precedence.  Figures  XI- 
4  and  XI- 5  present  methods  for  the  requisite 
searches. 

3.  Using  the  precedences  given  in  (7),  or  in  some 

similar  but  more  comprehensive  list,  denote  by 

th 

M(j)  the  precedence  of  the  j  item  of  (6).  For 
2^+1  =  j  ^  2^  +  n,  M(j)  =  M(2^  +  i),  the  pre¬ 
cedence  of  the  i**1  symbol  from  a  list  such  as  (7). 

4.  Let  M(2^)  =  M(2^  +  n  +  1)  *  -oo.  That  is,  drive 
the  endpoints  of  a  graph  such  as  that  given  in 
Figure  4  to  minus  infinity. 

5.  For  items  of  the  list  (8)  indexed  j,  2  -  j  <  2*\ 
define  precedence  as  follows: 

a.  If  M(2j)  and  M(2i  +  1)  are  defined,  let  M(j)  = 
min  (M(2j),  M(2j  +  1)J 

b.  If  only  M(2j)  is  defined,  let  M{j)  -  M(2j) 

c.  If  neither  M(2j)  or  M(2j  +  1)  is  defined,  then 
M{j)  is  undefined 

Undefined  items  will  not  affect  the  search  pro¬ 
cedures  specified  in  Figures  7  and  8. 

Tabic  XI-2  illustrates  the  compilation  procedure  described  above. 

The  statement  (3)  in  the  list  form  (5)  is  used  as  an  example. 

An  example  of  precedence  determination  for  an  expanded  list  such  as 
(6)  is  given  in  Table  XI- 3.  Again  the  statement  (3)  is  used  as  an  ex¬ 
ample.  The  number  of  symbols  in  statement  (3),  excluding  parenthe¬ 
ses,  is  n  -  19,  heme  for  p  such  that  2^  *  <  n  *  2  ?  2^,  p  =  S. 

This  accounts  for  the  expanded  list  index  running  from  2  through  92 
in  Table  XI-3. 
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TABLE  XI-3  -  EXAMPLE  OF  LIST  EXPANSION  AND 
PRECEDENCE  DETERMINATION  FOR  STATEMENT  (3) 


Statement 

Z  =  (A  +  B)*L  +  D*((E*F  +  G)*H  +  i)  (3) 


Precedence  table 


Symbol 

Variable 

* 

+ 


Precedence 

3 

2 

1 

0 


3 


Add  4  for  every  set  of 
parentheses  enclosing  a 
symbol 


n  =  19,  2P  1  <  n  +  2  ^  2P  »*p  =  3 


i 

M(i) 

i 

M(i) 

i 

M(i) 

2 

-on 

19 

-> 

36 

5 

+ 

3 

-oo 

20 

1 

37 

7 

B) 

4 

•a> 

21 

L 

38 

2 

* 

c 

1 

22 

10 

39 

3 

C 

6 

-00 

23 

9 

40 

1 

-f 

7 

24 

6 

41 

3 

D 

8 

-  oo 

25 

5 

42 

2 

Q 

2 

26 

-oo 

43 

11 

((E 

10 

1 

27 

44 

10 

* 

1  1 

0 

28 

4  5 

11 

F 

12 

3 

29 

46 

9 

*• 

13 

-a) 

30 

47 

1 1 

C) 

14 

31 

48 

6 

* 

13 

32 

-oo 

49 

7 

H 

tb 

•  vtJ 

33 

3 

50 

5 

- 

17 

0 

34 

0 

- 

51 

7 

I) 

18 

4 

35 

7 

52 
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Figures  XI- 4  and  XI- 5,  respectively,  present  flow  charts  of  search 


procedures  for  finding  L.  and  R.,  where  L 


r  °  J  J  JLJJ 

[leftmostj  symbol  to  the  left  right  of  symbol  j  having  precedence  less 


lU 


denotes  the  rightmost 


than  that  of  symbol  j.  These  search  procedures  can  be  executed  con¬ 
currently  in  approximately  21n£n  steps  where  n  is  the  number  of  sym¬ 
bols  (less  parentheses)  from  a  statement  such  as  (3). 


4.  CONCLUSIONS 

In  this  appendix,  three  modifications  of  the  parallel  compilation  algorithm 
have  been  presented.  The  iir&t  involved  preliminary  translation  of  re¬ 
placement  statements  into  reverse  Polish  notation;  the  second  involved 
innovations  in  the  use  of  the  para  11. 1  processor  programming  language; 
the  third  specified  a  restructuring  of  the  algorithm. 

The  first  two  modifications  were  easily  programmable  but  did  not  result 
in  significant  speed  advantages.  The  restructuring  offered  by  the  third 
modification  appears  to  provide  a  maximal  utilization  of  parallelism  in¬ 
herent  in  the  compilation  process.  The  restructured  compilation  algorithm 
has  not  yet  been  programmed  for  a  parallel  processor  nor  has  it  been  sub¬ 
jected  to  a  detailed  review.  It  is  recommended  that  further  study  of  the 
restructured  parallel  compilation  algorithm  be  carried  out. 
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A 


COMPUTATION  ALROGITHM  FOR  THE  IBM  7090 


1.  INTRODUCTION 

This  work  was  performed  under  Contract  AF30(6Q2)-3550,  Advanced  Com¬ 
puter  Organization  Study.  A  sequential  algorithm  for  compiling  substitution 
expressions  was  written  so  that  a  comparative  analysis  to  a  parallel  ma¬ 
chine  could  be  made.  The  IBM  7090  computer  was  chosen  as  the  sequential 
computer  on  which  to  make  the  comparisons. 

Reference  was  first  made  to  a  paper  by  Arden,  Gaiter,  and  Graham.*5  Ari 
extensive  analysis  was  made  of  the  compilation  of  a  general  complex  sub¬ 
stitution  expression,  and  further  investigation  led  to  a  general  derivation 
of  a  timing  equation  for  compiling  simplified  expressions. 

The  comparative  analysis  of  sequential  versus  parallel  compilation  is  made 
in  Appendix  XIII. 

2.  DESCRIPTION  OF  ALGORITHM 
a.  General 

In  this  IBM  7090  compiler  algorithm,  reasonable  assumptions  have 
been  made  as  to  what  will  be  the  format  of  the  input  string  or  substi¬ 
tution  expressions.  Due  to  the  hierarchy  of  the  operators,  the  oper¬ 
ands  are  considered  as  having  a  aero  level  ot  hierarchy.  The  operators 


a!BM  Reference  Manual,  7090  Data  Processing  System.  Poughtteepste,  N.  Y.  . 
International  Business  WcKines'CorporatTonT  August  1961. 

^Arden,  D,  W.  ;  Gallev,  B.  A.  ;  and  Graham,  R.  M.  :  An  Algorithm  for  Trans¬ 
lating  Bootvan  Expressions.  Ann  Arbor,  Mich.  .  University  of  Michigan,  do¬ 
le  be  r  1961.  "" 
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lilt  ted  below  constitute  an  unalterable  basic  set  whose  meaning  (soman 
tic  content)  it  used  in  the  decomposition  of  expressions.  Boolean  ex¬ 
pressions  will  not  be  considered  here  but  it  is  easily  seen  that  lo  in¬ 
clude  them  one  would  merely  extend  the  limits  of  the  algorithm.  All 
arithmetic  operators  except  the  exponential  will  be  generated  into  an 
object  program  in  single  precision  arithmetic.  The  overall  program 
package  will  require  input/output  routines  and  an  exponential  subrou¬ 
tine  (EXP)  that  is  available  in  the  MAD  compiler.  The  symbol  is 
used  in  statements  to  indicate  both  the  unary  (one-  operand)  and  the  bi¬ 
nary  (two  operands)  ooerator;  the  context  indicates  which  is  intended. 

Certain  arithmetic  operations  must  be  compiled  first  in  order  to  exe¬ 
cute  the  object  program  correctly.  It  is  for  this  reason  that  a  certain 
level  of  hierarchy  is  assigned  tu  each  of  the  input  string  of  items.  In 
this  algorithm,  the  chosen  hierarchy  (or  precedence)  is  as  shown  in 
Table  XII-  1 . 

TABLE  XII- 1  -  HIERARCHY  OF  INPUT  ITEMS 


Item 

Definition 

Precedence 

.  ABS. 

Absolute  Value 

12 

-u 

Unary  minus 

11 

Operators 

I 

Exponentiation 

10 

Multiplication 

Q  ] 

* 

t 

Division 

8  1 

I 

Binary  minus 

1 

+ 

Plus 

t) 

’  Operator 

Substitution 

5 

< 

i  Left  parenthesis 

•; 

) 

Right  parenthesis 

3 

1- 

Left  terminator 

y 

H 

Right  terminator 

t  . 

V 

Constant  or  van- 

able 

0 

Operand 
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If  the  input  items  used  in  an  expression  are  compiled  in  accordance 
with  this  hierarchy,  there  will  result  an  object  program  in  7090  code 
that  when  executed  will  accomplish  the  desired  arithmetic  operations. 

Through  ,ut  the  discussion,  the  following  symbols  are  used: 


PRECCSj)  = 
PRECfSjj)  = 


Precedence  of  current  operator 
Piecedence  of  previous  operator 
Current  referenced  input  string  item 


a 


Current  referenced  intermediate  string  item 


0  =  Input  string  index 
a  -  Intermediate  string  index 
R.  =  Current  triple  resultant  pointer 

M  (  =  First  generated  instruction  of  a  triple 

M.^  =  Second  generated  instruction  of  a  triple 

M.j  =  Third  generated  instruction  of  a  triple 

M.^  =  Fourth  generated  instruction  of  a  triple 


The  compiler  discussed  here  involves  an  input  pa=.>  that  will  assign 
the  constant/ variable  machine  locations  and  the  correct  hierarchy  to 
the  input  string.  Once  the  input  siring  ,s  .n  this  form  the  various 
items  are  placed  on  a  list  (SLIST).  T'ne  major  function  of  the  algo¬ 
rithm  involves  a  jingle  scanning  jf  the  expression  from  right  to  left 
(up  the  SLIST)  and  retaining  operands,  operations,  relations,  etc. 
on  an  intermediate  list  I  ;LLIST)  unci  an  operation  or  relation  (5  ) 

t 

occurs  which  is  of  lev/,  r  precedence  th«.  n  the  immediately  preceding 
operation  on  the  list  (£t  j). 

When  such  an  operation  or  relation  as  5,  is  encountered,  §  ,  <s  tom 

j  ,t  -  < 

piled,  Except  for  the  case  of  exponentiation  tie  compilation  consists 
of  „  resting  object  coding  of  three  instructions  (N*  j ,  M^,  and  VT^). 
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The  three  instructions  created  for  a  +  b  are  =  TLA  a,  = 

FAD  b,  and  M.,  =  STOR.,  where  R.  is  the  current  location  of  the  re- 
suit  of  performing  the  operation  a  +  b  which  will  be  called  a  triple  for 
this  discussion. 

b.  Input  String  Discussion 

The  input  routine  must  be  capable  of  reading  a  statement  into  the  7090 
and  assigning  relative  addresses  to  the  various  constants  and  variables 
(that  is,  count  the  different  operands).  If  a  variable  or  constant  (op¬ 
erand)  is  detected,  the  sign  bit  of  the  machine  word  referencing  its 
location  (bits  21  to  35)  is  made  one  (bit  0)  and  the  decrement  (bits  3- 
17)  is  zero.  An  operand  format  is  sketched  below. 


E 

(ZERO)  6 

RELATIVE  LOCATION 

S  1.  2  ,  3  17  21  35 


To  execute  the  various  arithmetic  operations  in  the  proper  sequence, 
a  certain  level  of  hierarchy  is  assigned  to  each  operator  [see  Page  284). 
In  the  case  of  operators,  only  the  hierarchy  mentioned  on  Page  284  is 
necessary  to  generate  the  correct  cbject  code  and  it  must  be  contained 
in  the  decrement  (bits  3  to  17)  field  of  each  item  word.  An  operator 
format  is  sketched  below. 


0 

s 

(ZERO)  6  1 

. I  .  .  .  -  J 

. . .  .—i 

S  1,  2.  3  17  21  35 

For  example,  should  there  be  a  complex  expression  such  as 

Z  *  (A*B  +  OD)  *  (ETF  +  ,  ABS.  I)  ♦  (JlK  -  L/M)/(-P  +  Q*R)  , 

the  input  routine  would  be  expected  to  produce  the  array  shown  in  Fig¬ 
ure  XII- 1  (see  Appendixes  I  and  III). 

c.  Input  Data  Discussion 

After  the  object  program  has  been  generated,  the  DATIN  subroutine 
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INPUT  STRING 
(OCTAL) 


CONSTANT,  VARIABLE.  AND 
RESULTANT  POOL 


DECREMENT 


ADDRESS _ IS 


LE F  T*T  E  R m NAT O «  PO OL  I  0  I  00000  I  0  [  00000  ]< ZERO) 


f  i  * 

r  ~  m 

I*  » 


LOCATIONS  ASSIGNED  DURING  THE 

input  routine  ano  filled  with 

FLOATING  POINT  VALUES  EY  THE 
DATIN  ROUTINE 


Figu  re  XII- \  -  Format  of  an  Input  String  of  Itama 
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must  be  capable  of  storing  floating  point  values  in  the  memory  loca¬ 
tions  selected  by  the  INPUT  subroutine.  These  values  will  start  at 
POOL+1,  etc.  The  first  location  POOL  is  aero.  If  the  input  data  are 
in  integer  form,  provision  must  be  made  to  convert  it  to  floating  point 
numbers. 

— •  Flow  Diagram  General  Description 

In  addition  to  the  terminology  used  in  the  general  discussion  of  this 
section,  the  following  additional  symbology  pertains  to  the  flow  dia¬ 
grams: 


—  means  "is  transferred  to" 

R.  +  1“*-R.  means  "-R.  is  increased  bv  one" 

ill 

The  flow  diagram  in  Figure  XII-2  is  by  no  means  complete  from  a  sys¬ 
tems  standpoint  as  mentioned  ahove.  The  operational  system  will  re¬ 
quire  inpat/output  and,  if  desired,  object  program  listing  routines. 

je.  Subroutine  Descriptions 

(1)  Absolute  Value 

The  absolute  value  may  be  detected  by  (1)  the  initial  scan,  or  (2) 
if  proceeded  by  T-tripie.  If  a  twelve  is  detected  in  the  decrement 
field  of  an  item  on  the  inj  it  string  (SLIST)  during  the  initial  scan, 
the  subroutine  SCOPEM  +  2  is  entered.  This  subroutine  will  pro¬ 
duce  three  instructions:  (1)  CLA  POOL,  winch  makes  the  AC  reg¬ 
ister  equal  to  zero  when  executed;  (2)  FAM  j,  which  adds  to 
the  AC  register  the  magnitude  of  the  contents  of  the  previous  item 
on  the  L  list,  providing  it  is  r.ot  an  operator:  and  (3)  STO  R., 
which  stores  the  final  result  of  the  operation,  .  ABS.  L  .. 

r  a- 1 

The  location  of  the  result  after  executing  these  instructions  is  pm 
on  the  L  h  it  to  keep  account  of  the  intermediate  steps.  The  result 
pointer  is  increased  by  vine  ami  the  program  returns  to  the  start  to 
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Figure  XU-2  -  Compiler  General  Flow  Diagram 
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examine  the  next  item.  If  the  absolute  value  is  detected  while 
examining  an  t- triple,  then  the  tree  mentioned  instructions  are 
formed  before  the  T-triple  instructions. 

(2)  Unary/Binary  Minus 

If  a  seven  is  detected  in  the  decrement  field  of  an  item  on  the  in¬ 
put  string  (SLIST)  during  the  initial  scan,  the  subroutine  MINUS 
is  entered.  This  subroutine  will  check  the  next  item  on  the  list 
(Sp  j)  to  see  if  it  is  an  operator.  If  the  next  item  (S^  j)  is  not  an 
operator,  then  the  minus  operation  is  binary  and  program  control 
goes  to  the  OTHERS  subroutine. 


If  the  next  item  (S^  j)  is  an  operator,  then  the  minus  is  a  unary 
operation  and  three  instructions  are  formed:  (1)  CLA  POOL, 
which  makes  the  contents  of  the  AC  register  equal  to  zero  once 
this  instruction  is  executed  at  object  time;  (2)  FSB  j,  which 
will  subtract  from  zero  (AC  register)  the  contents  of  L^  and 
(3)  STO  R.,  which  will  store  the  result  of  the  unary  operation  in 
an  intermediate  location. 


Then  a  and  #3  are  set  to  examine  the  next  input  item  and  compiler 
control  returns  to  START. 


(3)  Exponentiation 

If  an  eleven  is  detected  in  the  decrement  field  of  an  item  on  the 
input  string  (SLIST)  during  the  initial  scan,  the  POWERS  subrou¬ 
tine  is  entered.  This  subroutine  check!  the  next  operator  (S^  j) 
and  if  it  is  an  absohu*  vaiue  operator  performs  a  compilation  of 
an  absolute  value  operation  first. 

After  the  absolute  value  compilation  or  if  j  i#  not  an  absolute 
value  operator,  four  instructions  are  generated  for  EXP  subrou¬ 
tine  calling  sequences: 

1.  LDQ  Sj  j  *  the  value  to  be  raised  to  a 
power  is  placed  in  the  MO  register 
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2.  CLA  j  -  the  exponent  to  raise  a  value 
to  a  power  is  placed  in  the  AC  register 

3.  TSX  1  EXP  -  transfer  and  set  index  1  to  the 
current  object  program  address  and  go  to 
subroutine  EXP 

4.  STQ  R.  -  store  the  result  of  EXP  in  an  in¬ 
termediate  location. 

Then  a  and  1  are  set  to  examine  the  next  item  on  the  S  list  and 
compiler  control  returns  to  START. 

(4)  Parentheses 

Once  the  operator  "("  is  detected  (a  four  in  the  decrement  field), 
compiler  control  stays  in  the  LFTPRN  loop  until  all  operations 
within  the  parenthesis  have  been  compiled.  The  loop  stop  code 
is  of  course  ")"  (a  three  in  the  decrement  field),  and  at  *his  time 
compiler  control  goes  to  RTPRN  and  sets  a  and  /3  to  examine  the 
next  input  item.  The  final  resultant  within  the  parenthesis  (Rp) 
is  moved  up  one  position  on  the  L  list  by  the  RTPRN  routine  to 
completely  eliminate  the  parenthesis.  For  an  illustration  of  this 
case,  see  Page  316. 

(5)  Terminators 

Once  the  -  operator  is  detected  (a  two  in  the  decrement  field), 
compiler  control  stays  in  the  TERM  loop  until  all  the  operations 
of  the  expression  have  been  compiled.  The  loop  stop  code  for 
this  routine  is  -t  (a  one  in  the  decrement  field).  Once  the  condi¬ 
tion  t  Rj  -t  exists  and  there  are  r.o  more  expressions  to  be  com¬ 
piled,  control  <?oes  to  list  and/or  execute  the  program.  For  an 
illustration  of  this  case,  see  Page  316. 

(6)  Others 

The  OTHERS  subroutine  is  the  most  general  subroutine  in  the 
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compiler  and  effectively  the  input  string  could  be  defined  in  such 
a  manner  that  OTHERS  would  be  the  only  aubroutine  necesaary 
for  the  compiler.  In  the  next  section,  the  point  is  brought  out  as 
to  how  expressions  could  be  written  that  require  extensive  use  of 
the  OTHERS  routine.  This  subroutine  first  will  compare  the  pre¬ 
cedence  of  the  current  operator  with  the  precedence  of  the  pre¬ 
vious  operator;  that  is,  PREC(8.)  <  PREC(8;  i)*  If  the  current 
operator  precedence  is  less  than  the  previous  operator  prece¬ 
dence,  a  set  of  object  code  instructions  using  the  previous  oper¬ 
ator  is  formed.  If  precedence  of  the  current  operator  is  not  less 
than  the  precedence  of  the  previous  operator,  it  is  added  to  the 
intermediate  (L)  list  and  the  next  item  on  the  S  list  is  examined. 

3.  A  SIMPLIFIED  APPROACH  TO  COMPILING  SUBSTITUTION  EXPRESSIONS 
a.  General 

In  writing  substitution  statements,  many  compilers  try  to  reduce  the 
complexity  of  compilation  by  placing  restrictions  on  the  programmer. 
Sometimes,  these  restrictions  are  a  set  of  programming  rules  that 
will  discourage  the  use  of  parentheses  or  encourage  the  writing  of 
unary  operations  in  a  prescribed  manner.  By  using  the  expression 
on  Page  288,  the  programmer  could  have  conceivably  written: 


V  =  A*B  4  C*D 

(1) 

W  *  E  T  F  4  4  .  ABS.  I 

(2) 

X  *  J  K  -  L/M 

(3) 

Y  •  Q*R  -  P 

{«> 

Z  »  V*W  ♦  X/Y 

(3) 

Where  4  is  a  location  containing  sero.  It  is  noted  that  each  simplified 
expression  contains  an  odd  number  of  items.  As  a  general  rule,  when 
writing  expressions  in  the  above,  simplified  manner,  the  compilation 
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time  is  thought  to  be  reduced  considerably.  Such  is  not  the  case  when 
using  this  algorithm  for  compilation.  The  reason  for  this  is  due  to 
the  shorter  loops  (see  Page  290)  for 

Unary  minus  -  44  cycles  , 

T(POWERS)  -  45  cycles  , 

Normal  absolute  value  -  37  cycles  , 

whereas  the  general  OTHERS  loop  requires  50  cycles.  The  OTHERS 
routine  as  discussed  in  greater  detail  on  Page  294  is  used  for  a  general 
timing  equation  derivation  later  in  this  section. 

b.  Compilation  of  Complex  Expressions 

A  complete  simulation  of  the  compilation  of  the  expression  given  on 
Page  288  is  illustrated  in  great  detail  beginning  on  Page  316.  The 
overall  compilation  time  for  the  complex  expression  is  found  to  be 
1270  cycles  or 


T( total'  =  12?0(2.  183)  usee 
=  2 ,  3686  msec  . 

The  reason  for  such  derail  is  to  give  the  reader  an  insight  into  what  is 
involved  in  order  to  do  a  compilation. 

£,  Derivation  of  a  General  Timing  Equation  for  Compiling  Simplified  Ex¬ 
press  ion  ■ 

From  the  simplified  expressions  on  Page  2°5  and  the  flow  diagram  on 
Page  2^7,  .I  can  be  noted  that  there  are  (n  -  l),  2  operators,  (n  -  1  )/i 
operands,  and  (n  -  1)2  triples,  where  n  is  the  number  of  items  in  he 
expression.  It  is  seen  from  the  general  timing  equations  beginning  on 
Page  116  that  the  compiler  requires  11  c  yc Tes  to  acknowledge  and 
transfer  an  operand  from  the  input  str.ng  to  the  intermediate  list  so 
the  time  (in  cycles)  > o  transfer  operands  is 
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Figure  XII-J  -  An  Arithmetic  Opera,  r  General  Flow  Diagram 
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^'operands)  “  * 1 


n  +  1 


(6) 


Using  the  data  from  the  timing  equations,  it  follows  that  there  is  gen 
orally 

n  -  1 


‘(triple)  *  50  T 


(7) 


cycles  per  simplified  expression. 

The  timing  equations  show  that  15  cycles  per  operator  is  required  or 

f  •  _  15(n  -  1) 

^operators)  1  * 

Using  the  three  equations,  the  total  time  for  each  expression  is 
t(total)  *  ^operands)  f  ^triples)  +  t(operators) 

If  N  s  9  as  in  the  simplified  expressions  1,  2,  3,  or  5  on  Page  295, 
then 

t(totai)(1)  =  3®(9)  *  27  =  315  cycles  . 

In  simplified  expression  (4)  on  Page  293,  N  =  7  and 

t(total)<4)  *  38(7)  -  27  *  23  9  cycles 
so  the  overall  time  required  to  compile  the  simplified  expressions  is: 

T(total)  *  t(total)fl)  +  t(total)(2)  +  '(total)*3  *  +  '(total)*4*  +  '(total)*5* 

«  315  +  315  ♦  315  +  239  +  315  ■  1499  cyelss  . 

4.  CHARTS,  ASSEMBLY  USTING,  AND  TIMING  EQUATIONS 

The  compiler  flow  charts,  an  assembly  listing  of  a  compiler,  the  general 
timing  aquations,  and  a  simulation  of  a  compilation  are  presented  on  the 
following  pages. 
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Figure  XII-4  -  Compiler  Flow  Chart  (Continued) 
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Figure  XII-4  -  Compiler  Flow  Chart  (Continued) 
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Figure  XII- 4  -  Compiler  Flow  Chert  (Continued) 
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ASSEMBLY  LISTING  OF  COMPILER 


« 


OP 

ADDRESS,  TAG, 

DECREMENT 

COMMENTS 

BEGIN 

CLA 

RST 

2 

SET 

00010 

STO 

RPNTR 

2 

• 

RESULT  POINTER 

00020 

LAC 

1ST.  1 

2 

9 

SET  OBJECT 

00030 

SXA 

MPTR 

2 

PROGRAM  POINTER 

00040 

SET 

TSX 

INPUT,  1 

1 

INPUT  ONE  STATEMENT 

oooso 

AXC 

1.  1 

1 

1  (COMPLEMENT)  -  a  »  XI 

00060 

CLA 

RTERM 

2 

00070 

STO 

LUST 

2 

12 

*  ~  h 

00000 

CLA 

LTERM 

2 

00090 

STO 

S1JST 

2 

►  **S0 

OOiOO 

LAC 

CHAR,  2 

2 

CHAR  COUNT  (COMPLEMENT)  -  0 

00110 

START 

CLA 

SUST,  2 

2 

4  4  1 

00120 

PDC 

R.  4 

1 

JUST 

PREC(X)  (COMPLEMENT)  -  X4 

00130 

TRA 

JUST,  4 

1 

t  5 

00140 

JUST 

TRA 

VARCON 

1 

VARIABLE  OR  CONSTANT 

00  ISO 

TLA 

VARCON 

1 

1  *  »I0  (DECREMENT) 

00160 

TRA 

TERM 

1 

►  *  2t10  (DECREMENT) 

00170 

TRA 

VARCON 

1 

)  *  J)1Q  (DECREMENT) 

00  ISO 

TRA 

LPTPRN 

1 

(  «  4)  jQ  (DECREMENT) 

00190 

TRA 

OTHERS 

I 

*  «  5)  ,c  (DECREMENT) 

00200 

TRA 

OTHERS 

1 

♦  »  6) |Q  (DECREMENT) 

00210 

TRA 

MINUS 

1 

-  »  I)  jQ  (DECREMENT) 

00220 

TRA 

OTHERS 

I 

/  »  »1|0  (DECREMENT) 

00230 

TRA 

OTHERS 

1 

*  »  9),q  (DECREMENT) 

00240 

TRA 

POWERS 

1 

)  -  U), „  -<*  -  «0/ ,Q 

002  SO 

TRA 

SCOPEM*2 

1 

■  ABS.  *  12), „  (DECREMENT) 

00260 

VARCON 

STO 

LUST.  1 

2 

S.  -»  L 

3  o 

00270 

TXI 

NEXT.  1.  -1 

2 

6 

a  *  1  —  o 

00280 

NEXT 

tx; 

START.  2.  1 

2 

3-i  -3 

00290 

U'fPRN 

CAS 

LUST  '2.  1 

3 

PREC(A)  x  PRECIS^,) 

00300 

TRA 

RTPRN 

l 

00310 

HTR 

BEGIN 

2 

CO  320 

STO 

TEMP 

2 

Sj  -  TEMP 

00330 

CLA 

LUST  t 

2 

Vi-AC 

00340 

PDC 

i.  4 

1 

1 1 

0C330 

SXA 

TEMPX.  2 

l 

X2  -  TEWPX 

00360 

LX  A 

MPTR.  2 

l 

MPTR  -  a 

003*0 

CLA 

DUMMYA  4 

2 

00360 

ADM 

LUST  -1.  1 

2 

4 

MANUFACTURE  AN 

00  3  $0 

STO 

*  I 

2 

INSTRUCTION 

00400 

CLA 

DUMMYB.  4 

2 

00410 

ADM 

LUST  -I.  ! 

2 

6 

MAIN  INSTRUCTION 

00420 

STO 

1.  * 

2 

00430 

CLA 

RPNTR 

2 

094** 

4 

‘A;  "*  L  | 

STO 

LUST  ■«,  | 

00  450 

SSP 

1 

* 

MAKEIAO  rt-US 

00460 

ADD 

DUMMY*.  4 

* 

4 

OP4T0 

STO 

2  t 

2 

STO-*,  ~Mk, 

00480 
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RTPRN 


OTHERS 


HUM 


ASSEMBLY  LISTING  OF  COMPILER  (Continued) 


s 


OP 

ADDRESS,  » AO, 

DECREMENT 

COMMENTS 

TXJ 

*+l.  2.  3 

2 

M  »  3  -  M 

00440 

SXA 

MPTR  2 

2 

X2  -  MPTR 

00400 

LXA 

TEMPX,  2 

2 

n  -  a 

00S10 

CLA 

RPNTR 

2 

00320 

SUB 

ONE 

2 

16 

•R,  ■  l  «  -IRj  ♦  1, 

00330 

STO 

RPNTR 

Z 

00340 

CLA 

TEMP 

Z 

00350 

TXi 

LFTPRN,  1.  Z 

2 

a  • -  a 

00360 

CLA 

LUST  -1.  1 

2 

(RjS  CASE 

00570 

STO 

LUST  -Z,  1 

: 

b 

00380 

TXI 

NEXT,  1.  1 

2 

a  -  1  -  « 

00340 

CAS 

LUST  -2.  1 

3 

00600 

TRA 

VARCON 

1 

AC  >  Y  GO  ADD 

00610 

TRA 

VARCON 

1 

AC  •  Y  TO  UST 

00620 

STO 

TEMP 

2' 

Sj  -  TEMP 

00630 

CLA 

LUST  *Z.  ) 

* 

00640 

PDC 

4.  4 

1 

9 

AC  (DECREMENT  COMPLEMENT)  X4 

00660 

SXA 

TEMPX,  Z 

2 

XZI0)  -  TEMPX 

00670 

UCA 

MFTR,  2 

2  . 

MPTR  -  X2 

00680 

CLA 

DUMMY  A.  4 

2 

00640 

A  DM 

LUST  -1.  1 

Z 

6 

MANUFACTURE  AN 

'*700 

STO 

4.  z 

z 

INSTRUCTION 

00710 

CLA 

DUMMY  8  4 

z 

MAIN 

00720 

ADM 

LUST  *J.  1 

2 

6 

INSTRUCTION 

00730 

STO 

I.  Z 

2 

00740 

CLA 

STO 

RPNTR 

LUST  -1 

2 

2 

|  4 

Ri  -  L..I 

00730 

00760 

ssp 

3 

2 

MAKE  AC* 

007  ro 

ADD 

OUMMYR.  4 

2  1 

4 

STO  -  Rt  -  Mn 

00780 

STO 

Z.  2 

2  j 

0O740 

TXl 

•♦t.  2.  -3 

2 

X2*  J  -  X2 

00300 

SXA 

MPTR.  2 

2 

00310 

LXA 

T EMI  X  2 

2 

00820 

CLA 

RPNTR 

1 

00830 

SUB 

ONE 

2 

■  i6 

-Rj-  I  •  -IRj  *  1) 

00840 

STO 

RPHTR 

Z 

00830 

CLA 

TEMP 

2 

*;  *AC 
•  *!-** 

00860 

TXl 

OTHERS.  I.  Z 

2 

00870 

CAS 

LUST  *Z.  1 

3 

TEST  PRLC 

008*0 

TRA 

EX*eUT 

l 

UST  AND/OR  EXECUTE  ^4 

00**0 

KTR 

SEOIN 

I 

ERROR  t  6  S 

00400 

STO 

TEMP 

2 

»  -  ;emp 

00*10 

CLA 

LUST  2  1 

i 

00*20 

roc 

i  4 

l 

00  430 

SXA 

TEMPX  1 

2 

1! 

X1(J|  -  TEMPX 

00*40 

LXA 

MPTR  I 

1 

MPTR  -  Xi 

0*4*0 

CLA 

DUMMY  A  4 

1 

004*0 

A  DM 

LUST  t 

1 

MANUFACTURE 

00*70 

STO 

i  i 

i 

0**00 
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ASSEMBLY  LISTING  OF  COMPILER  (Continued) 


MINUS 


SCOPE* 


row  cu 


s 


OP 

ADDRESS,  TAG,  DECREMENT 

COMMENTS 

CLA 

DUMMTB,  *  2 

00410 

ADM 

LUST  -I.  1  2 

S 

STORE  FINAL  RESULT  IN 

01000 

STO 

1,2  2 

SUBSCRIPTED  VARIABLE 

01010 

CLA 

STO 

RPNTR  2 

LUST  *3  2 

4 

*2-V> 

01020 

010M 

sap 

3  2 

MAKE  AO* 

01040 

ADI) 

DUMMTB.  4  2 

4 

SAVE  NEXT  Rt  LOCATION 

010S0 

STC 

2.  2  2 

TOR  THE  NEXT  EXPRESSION 

01040 

TX1 

•♦1.  2.  *1  2 

>a  -  3  -  X2 

010T0 

SXA 

MPTR,  2  2 

X2  —MPTR 

01000 

LXA 

TEMPX.  2  2 

S  -  X2 

01010 

CLA 

RPNTR  2 

IS 

01100 

SUB 

ONE  2 

INCREASE  Rj  POINTER 

OHIO 

STO 

RPNTR  2 

OHIO 

CLA 

TEMP  2 

»j-AC 
•  -  2  —  « 

OHM 

TX1 

TERM.  1.  2  2 

01140 

CLA 

SLUT  -1.  2  2 

*S-I  **  AC 

01ISC 

TPL 

SCOPE*  2 

III-)  OPERAND  K  <♦»  OPERATOR 

01140 

CLA 

SLUT.  2  2 

> 

S*  -AC 

oino 

TBA 

OTHERS  ] 

W 

onto 

CLA 

UNMtN  2 

PREC  -  AC 

OHIO 

PDC 

4-  *  1 

T 

DEC  -  X4 

01200 

SXA 

TEMPX.  2  2 

XI (R  -  TEMPX 

01210 

LXA 

MPTR.  2  2 

MPTR  -  Xt 

01220 

CLA 

CLEAR  2  j 

4 

C LA-POOL  -M,^ 

012S0 

STO 

4.  2  2 

01240 

CLA 

DUMMTB.  4  2 

omo 

ADM 

LUST  -1.  1  2 

4 

FSB  L,.,  -  M„ 

01240 

•STO 

1.2  2 

0UT0 

CLA 

RPNTR  2 

4 

R,  -  AC 

01240 

STO 

LUST  -1.  1  2 

*4  "*  L«-l 

01210 

SSP 

1  2 

OUOO 

ADD 

DUMMTB.  4  2 

S 

STO  •  Rt  -  Mu 

OHIO 

STO 

2.  2  2 

OHIO 

TX1 

•♦1.  i.  •'  2 

oim 

SXA 

MPTR.  1  2 

XX  -MPTR 

01340 

LXA 

TEMPX.  2  1 

OHM 

CLY 

RPNTR  I 

1) 

01340 

SLB 

ONE  2 

INCREASE  r,  pointer 

01330 

STO 

RPNTR  I 

01300 

TXI 

START.  2.  1  1 

S  *  1  -  # 

OHM 

CLA 

3UST  ■*.  1  t 

*3*1  ”AC 

41400 

CAS 

absval  i 

TEST  .YU 

01410 

MTS 

BEGIN  1 

ERROR  MALT 

01410 

TBA 

ABTCKM  | 

01430 

CLA 

DUMMY  A.  4  2  1 

4 

0(400 

adm' 

SLUT  -1.  2  i  j 

01400 

SXA 

TEMPX  2  I  I 

XI (U  -TEMPX 

01440 

LXA 

MPTB.  1  |  | 

S 

MPTR  -  XI 

41430 

STO 

4  1  1  I 

LO°  •  v«  - 

01400 

-30/- 
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ASSEMBLY  LISTING  OF  COMPILER  (Continued) 


4 


OP 

ADDRESS,  TAG, 

DFCREMFNT 

COMMENTS 

CLA 

DUMMvb.  4 

2 

01490 

ADM 

LUST  -1,  1 

2 

6 

CLA  -  L0.j  -  b*a 

0150C 

STO 

1.  2 

2 

01510 

CLA 

TRANSX 

2 

4 

TiX  •  EXP  -Mjj 

01520 

STO 

2,  2 

2 

0153U 

CLA 

RPNTR 

2 

0 1 40 

STO 

LUST  l 

2 

*» 

R)  -  L„  , 

01550 

SSP 

3 

2  1 

MAKE* 

01560 

ADD 

DUMMY R.  4 

2 

6 

STO  -  R.  M., 
i  i4 

015^0 

STO 

3,  2 

2  ! 

01580 

TX1 

*M.  2.  4 

2 

01590 

SXA 

MPTR,  2 

2 

• 

X2  -MPNTR 

01600 

LLa 

T  EM » 'X.  2 

2 

-  X2 

01610 

CLA 

RPNTR 

2 

4 

01620 

.-•OB 

ONE 

2 

INCREASE  Rf  POINTER 

01630 

STO 

RPNTR 

2 

01640 

TXI 

STAPT,  2.  2 

2 

01650 

ABFOPJvl 

o  La 

DUMMYB.  M2 

2  3 

01660 

ADM 

SLIST  -1,2 

2 

FAM- 

Cl  670 

SXA 

TEMPX,  2 

2 

10 

01680 

l.XA 

MPTR.  2 

2 

01690 

STO 

1.  2 

2 

01700 

CL\ 

CLEAR 

2 

4 

CLA  -  POOL  -  Mu 

01710 

STO 

4.  2 

2 

01720 

C’ 

DJMMYR.  M2 

2 

: 

0J730 

RPNTR 

► 

STO  -  8.  -  Mt1 

01740 

A  iJ 

2.  2 

2 

01750 

TX1 

*M.  2  -3 

2 

Cl  760 

SXA 

MPTR.  2 

2 

01770 

LXA 

1  EM.’X.  2 

2, 

10 

01780 

CLa 

RPNTR 

2 

C.‘  790 

STO 

5U'»T  -2.  2 

: 

Ri  ^  V; 

01800 

SUB 

ONE 

2 

0 1 3 10 

STO 

RPNTR 

2 

01820 

TXT 

POWERS  *4.  2.  1 

2 

02830 

EXECUT 

TSX 

DAT  AIN,  l 

EXECUT  -  XI 

01840 

TRA 

ML1ST 

01850 

MUST 

BSS 

OBJECT  PROGRAM 

01860 

RST 

OCT 

•i04 

RELATIVE  DISPLACEMENT 

Cl 8  .‘0 

RPNTR 

A  vs 

RESULT  POINTER 

01880 

1ST 

MUST 

SET  FOR  MPNTR 

01 390 

MPT* 

RS3 

OBJECT  PROGRAM  POINTER 

014C0 

RTtRM 

OCT 

►  OP  4  1 

RIGHT  TERMINATOR 

01410 

DUMMY  A 

NOP 

POOL 

VARIABLE  OR  CONSTANT  • 

01411 

NOP 

POOL 

■4 

01412 

NOP 

wt 

► 

91411 

NOP 

) 

01414 

SOi* 

POOL 

( 

01M8 

CIA 

POOL 

• 

01416 

308 
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ASSEMBLY  LISTING  OF  COMPILER  (Continued) 


8 


OP 

ADDRESS,  TAG,  DECREMENT 

COMMENTS 

CLA 

POOL 

♦  POOL  ♦  L  , 

0191? 

CLA 

POOL 

- 

0191# 

CLA 

POOL 

/ 

01919 

LDQ 

POOL 

• 

01920 

CLA 

POOL 

- 

01930 

LDQ 

POOL 

t 

01940 

CLA 

POOL 

.  ABS. 

01950 

DUMMYB 

NOP 

POOL 

VARIABLE/CONSTANT 

01960 

HOP 

POOL 

■« 

01970 

NOP 

POOL 

h 

01980 

NOP 

POOL 

) 

01990 

NOP 

POOL 

( 

02000 

STO 

POOL 

s 

02010 

FAD 

POOL 

♦  POOL  ♦  L  - 

02020 

a-3 

FSB 

POOL 

- 

02030 

FPP 

POOL 

/ 

02040 

FMP 

POOL 

* 

0205C 

FSB 

POOL 

- 

02060 

LDA 

POOL 

r 

02070 

FAM 

POOL 

.  ABS. 

02080 

DUMMYR 

NOP 

POOL 

VARIABLE/CONSTANT 

020)0 

NOP 

POOL 

-4 

02100 

NOP 

POOL 

K 

02110 

NOP 

POOL 

) 

02120 

NOP 

POOL 

i 

02130 

NOP 

POOL 

-  POOL  *  100(R4) 

02140 

STO 

POOL 

♦ 

02150 

STO 

POOL 

02160 

STO 

POOL 

/ 

02170 

STQ 

POOL 

• 

02180 

f  TO 

POOL 

•u 

02 !  9C 

STO 

POOL 

T 

02200 

STO 

POOL 

ABS. 

02210 

LL  1ST 

BSS 

Mi 

INTERMEDIATE  STORAGE 

02220 

LTERM 

►  OP  i.  l 

LEFT  TERMINATOR  OPERATOR 

022  K 

SL1ST 

ess 

Mi 

INPUT  STkiNC 

02*  40 

CHAR 

BSS 

CHARACTER  COUNT 

02250 

TEMP 

nss 

TEMPORARY  STORAGE 

0W6O 

TEMPX 

ess 

INDEX  STORAGE 

0i'47C 

ONE 

OCT 

i 

nm 

UNMiW 

V  OP  i  li 

UNARY  OPERATOR 

022*) 

Cl  EAR 

'LA 

POOL 

oj  too 

POOL 

?RO 

EERO 

02  M0 

BS> 

2W 

constant  /variable  'results 

02  W0 

ABSVAL 

SB*  OP  i  U 

02  540 

TRANSX 

TSX 

EXP  > 

TRANSFER  TO  SUBROUTINE 

02  550 

INPUT 

a  y.M 

1NPV7  ONE  STATEMENT 

ROUiiNE 

OJH* 

OATAJN 

READ  DATA  ROL  TINE 

02  5  TO 

TRA 

1.  i 

GO  TO  L  *  1  {MAIN  PROGRAM- 

02  540 
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GENERAL  TIMING  EQUATIONS 


Initializati  >n 

Time  to  set  the  triple  resultant  pointer  and  object  program  pointer: 

BEGIN00010  -  BEGIN00040  8  8  CycleS 
Time  to  set  the  program  to  compile  one  expression: 

setooo5o  '  setogiio  =  cycler> 

Time  to  acknowledge  one  item: 

STARTqq ^20  *  JLISi00150-00260  "  b  cycles 

Variable  or  Constant 

Time  to  transfer  a  variable  or  constant  from  the  SLIST  to  the  LLIST; 
VARCONq0270  -  NEXT00290  =  b  cycles 

Time  to  discover  PREC{’'(")  <  PREC (Sj.j) 

Left  Parenthesis 

LFTPRN0030C  =  3  cycles 
Time  to  manufacture  first  instruction: 

LFTPRNOO330  -  LFTPRNQ0400  =  13  cycles 

Time  to  manufacture  second  instruction: 

LFTPRN0u410  -  LFTPRN00430  -  6  cycle# 
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Time  to  STORE  resultant  pointer  in  LLIST  -3 

I,FTpRN00440  -  LFTPRNq0450  --  4  cycles 

Time  to  manufacture  third  instruction: 

LFTFRN00460  '  LFTPRN00480  *  *  c>rcles 

Time  to  increase  ft.  pointer  by  one,  object  program  counter  by  three,  and 
decrement  a  by  twor 

LFTPRNqo4<j0  ‘  I,FTPRN0056Q  =  16  cycles 

Time  to  form  a  triple  set  of  instructions: 

Xlftprn00300  -  lftprn00560  =  50  cycles 

Time  to  put  R.  at  L  .  back  to  ,  for  (R.)  case 
r  1  a~i  a- 2  '  r 

RTPRNqq^yq  “  RTPRNqq^^q  -  11  cycles 


Others 

Time  to  discover  PREC(Sj)  <  PREC(Sj^): 

OTHERSqq^qq  =  3  cycles  if  ye* 

4  cycles  if  no 

Time  to  manufacture  the  first  instruction: 

OTHERS00630  •  OTHERS00710  *  15  cycl*‘ 
Time  to  manufacture  the  second  instruction: 

OTHERS00720  -  OTHERS00140  •  6  cycl.. 

Time  to  store  resultant  pointer  in  LLIST  *3: 

others0075o  *  othersoo76o  b  *  cycl## 
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Time  to  manufacture  the  third  instruction: 

OTHERS00770  -  OTHERS^  =  6  cycles 

Time  to  increase  pointer  by  one,  object  program  pointer  by  three,  and 
decrement  a  by  two 

OTHERS00800  -  OTHERS00870  =  16  cycles 
Time  to  form  a  triple  set  cf  instructions 


IOTHERS00600  -  CTHERS00870  =  50  cycles 


Terminator 

Time  to  test  PREC(  -)  <  PRECfSj.j): 

TERM008g0  =  3  cycles 

Time  to  manufacture  the  first  instruction: 

TERMgQgio  *  TERMQ0980  =  13  cycles 

Time  to  manufacture  the  second  instruction: 

TERM00990  '  TERMqioio  =  6  cycles 
Time  to  put  the  R^  pointer  into  LLIST  -3: 

TERMq ^020  ’  TER^01030  "  4  cycles 

.  ime  to  manufacture  the  third  instruction: 

TERMq jq4q  -  TERMQ10fc0  =  6  cycles 

Time  to  increase  R.  pointer  by  one,  object  program  pointer  by  three,  and 
decrement  a  by  two: 

TERM01070  *  TERM01149  =  cycles 
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Time  to  form  a  triple  set  of  instructions: 

IT£RM00880  -  TERM0n40  =  50  cycles 


Minus 

Time  to  discover  binary  minus 

MINUS01150  -  MINUS011go  =  7  cycles 

Time  to  form  binary  minus  triple  set  of  instructions: 

MINUS01150  ‘  OTHERS00870  =  50  +  7  =  57  cycles 
Time  to  discover  unary  minus 

MINUS01150  '  MINUS01160  *  4  cVcl" 

Time  to  manufacture  the  first  instruction: 

SCOPEMq^  ■  SCOPEMq  =  ^  cycles 

Time  to  manufacture  the  second  instruction: 

SCOPEM0i;50  -  SCOPEMQ1270  =  6  cycles 

Time  to  put  the  resultant  point  into  LLIST  *1: 

SCOPEM01280  -  SCOPEM0129Q  =  4  cycles 

Time  to  manufacture  the  third  instruction: 

SCOPEMqj^qq  •  SCOPEMqj  j2Q  =  6  cycles 

Time  to  increase  R.  pointer  by  one,  object  program  counter  by  three,  and 
decriment  0  by  one: 

SCOPEM0j  330  -  SCOPEM01390  =  13  cycles 
Time  to  form  a  unary  minus  trifle  set  of  instructions: 
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XMINUS01150  -  SCOPEM01390  =  44  cycles 


Powers 


Time  to  test 


for  absolute  value: 


POWERSohoo  -  POWERS01410  =  5  cycles 


Time  to  manufacture  the  first  instruction: 


POWERS  144Q  -  POWERSq1480  =  10  cycles 
Time  to  manufacture  the  second  instruction: 

powers01490  -  powers01510  =  6  cycles 


Time  to  manufacture  the  third  instruction: 

POWERS1520  -  POWERS1530  =  4  cycles 

Time  to  put  the  R^^  pointer  on  LLIST  -1: 


powers0i560  -  powers0158  =  6  cycles 


Time  to  increase  the  R^  pointer  by  one,  the  program  pointer  by  four,  and 
decrease  /3  by  two: 

POWERS01590  -  POWERS01650  =  14  cycles 
Time  to  form  a  calling  sequence  of  four  instructions: 


Xpowers01400  -  powers0|S50  =  «  cycles 


.  ABS.  V  T  V  Condition 


Time  if  Sa  ,  * 
P'4 


.  ABS.: 


*  POWERS01430  “  6  cycles 
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Time  to  form  the  first  instruction: 

ABFORMQ1650  -  ABFORM01700  =  10  cycles 

Time  to  form  the  second  instruction: 

ABFORMqj7jq  -  ABFORM0172q  =  4  cycles 

Time  to  form  the  third  instruction: 

ABFORMq j7jq  -  ABFORMQ1750  =  6  cycles 

Time  to  put  R.  on  SLIST  -2  and  increase  the  object  program  counter  by 
three: 

ABFORMomo  -  ABFORM0Ie00  *  10  cycles 
Time ‘to  increase  R.  pointer  by  one  and  decrease  0  by  one: 

ABFORM01gl0  -  ABFORMq j g 3q  =  6  cycles 
Time  to  form  a  triple  set  of  instructions  for  this  case  of  absolute  value: 

IPOWERS0i400  -  ABFORM01830  =  42  cycle. 

Normal  Absolute  Value 

ISCOPEM01210  -  SCOPEMQ1390  =  37  cycles 
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f- 

Z 


( 

A 

* 

B 


C 

* 

D 


( 

r 

T 

F 


ABS. 

I 


SIMULATION  OF  A  COMPILATION 
Status  of  SList 


Accumulative 
time  to 
reference 
item  (cycles) 

1216 
1205 
1090 
973 
962 
947 
936 
871 
860 
845 
834 
823 
808 
741 
696 
687 
6  76 
obi 


S/3 

) 

+ 

( 

J 

t 

K 

L 

/ 

M 

) 

/ 

t 

\ 


P 


Q 


bl9  R 

t>08  ) 


Accumulative 
time  to 
reference 
item  (cycles) 

597 

532 

465 

420 

411 

400 

328 

317 

302 

291 

280 

265 

198 

149 

138 

73 

62 

47 

3t> 

25 
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Status  of  LList  at  Various  Intervals 


a 

L 

a 

Accumulative 
time  (cycles) 

S0— L 

0  o 

a 

Lcr 

S3~La 

a 

L* 

Sj9  Lo 

0 

H 

20 

0 

H 

20 

0 

H 

20 

1 

) 

31 

l 

) 

31 

1 

) 

31 

2 

R 

42 

2 

Ri 

101 

2 

Ri 

101 

3 

* 

57 

3 

© 

3 

+ 

133 

4 

Q 

68 

4 

p 

144 

5 

© 

5 

0 

PREC(+) 
Q*R  —  Rj 

'  PREC(*) 

PREC(+)  >  PREC(") ") 

S0-l 

’P 

is  an  operator 

a 

La 

S0~La 

a 

La 

S3 — L 

0  a 

a 

La 

S0  L0 

0 

H 

20 

0 

H 

20 

0 

-1 

20 

l 

) 

31 

l 

) 

31 

l 

R3 

260 

2 

Ri 

101 

2 

R3 

226 

2 

/ 

275 

3 

* 

133 

3 

0 

3 

) 

286 

4 

r2 

174 

4 

M 

297 

5 

5 

/ 

312 

6 

L 

323 

7 

O 

PRECVV)  <  PREC( "+  ■') 

R2  *  Rj— R, 

V 

B 

! 

PREC(-)  ^ 
L/M— R4 

-u  <  PREC(/) 
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Status  of  LList  at  Various  Intervals  (Continued) 


a 

L 

S„-*L 

a 

L 

a 

L 

Sfl— L 

a 

i3  a 

a 

/3  a 

a 

P  < 

0 

H 

20 

0 

20 

0 

H 

20 

1 

R3 

260 

l 

R3 

260 

l 

R3 

260 

2 

/ 

275 

2 

/ 

275 

2 

/ 

275 

3 

) 

286 

3 

) 

286 

3 

) 

286 

4 

R4 

363 

4 

R4 

363 

4 

R6 

493 

5 

395 

5 

395 

5 

© 

6 

K 

406 

6 

R5 

440 

7 

© 

7 

© 

Sfl-2 

^  .ABS. 

PRECO")  < 

PREC(-) 

(R.)  CASE 

;.jtk— r5 

'  ■  R5 

-r4- 

R6 

•••V 

-2 

Of  L  S_  *  L 

a  0  a 

a 

L 

a 

srLo 

a 

L 

a 

S„— *L 
i3  i 

0  ^  20 

0 

H 

20 

0 

H 

20 

1  R3  260 

l 

R7 

560 

l 

R7 

560 

2  /  275 

2 

* 

592 

2 

4* 

592 

3  R.  527 

3 

) 

603 

3 

) 

603 

4  © 

4 

(.  ABS, 

614 

4 

R8 

637 

5 

5 

+ 

671 

6 

F 

■  -n. 

682 

PREC(+)  <  PREC(/) 

•  R6/R3“RJ 

.  ABS. 

7 

V; 

.  t 

4  .  ABS. 

.\ETF  —  R9 
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Status  of  LLi«t  at  Various  Intervals  (Continued) 


a 

La 

a 

La 

S;-L„ 

a 

Lo 

/3  Of 

0 

H 

20 

0 

H 

20 

0 

-1 

20 

l 

R7 

560 

l 

R7 

560 

l 

R7 

560 

2 

+ 

592 

2 

+ 

592 

2 

+ 

592 

3 

) 

603 

3 

) 

603 

3 

R10 

803 

4 

R8 

637 

4 

R10 

769 

4 

* 

818 

5 

+ 

671 

5 

<0 

5 

) 

829 

6 

R9 

716 

6 

D 

840 

7 

V 

7 

* 

855 

8 

C 

866 

9 

f 

PREC{ "(")  <  PREC(+) 

(A.)  CASE 

PREC(+)  <  PREC(*) 

"V 

R8- 

R10 

* 

L0“l”*L0 

-2 

*  * 

C*D—Rn 

u 

L« 

S0  La 

a 

La 

s3  L. 

a 

L. 

P  o 

0 

20 

0 

H 

20 

0 

H 

20 

1 

R- 

560 

1 

R7 

560 

l 

R, 

1 

560 

2 

■f 

592 

2 

+ 

592 

2 

592 

3 

R10 

803 

3 

R10 

803 

3 

R10 

803 

4 

* 

618 

4 

* 

818 

4 

* 

81o 

5 

) 

829 

) 

829 

5 

) 

829 

6 

7 

rh 

♦ 

899 

931 

6 

7 

rh 

♦ 

899 

931 

6 

7 

R13 

© 

1051 

8 

B 

942 

8 

RI2 

1001 

9 

* 

957 

9 

® 

10 

A 

968 

11 

,1 

PREC{ 

T’>  <  PREC{*1 

pREcrn  <  precm 

(R-)CASE 

A*B 

R 

12  *  Rl  l”~ 

R1S 

S 

I-l“~L0-2 
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I 


I 


Status  of  LList  at  V>  rious  Intervals  (Continued) 


a 

La 

Sfl— L 
p  a 

a 

La 

sf~La 

a 

La 

sr*L, 

0 

H 

20 

0 

20 

0 

H 

20 

l 

56p 

l 

R7 

560 

1 

R15 

1168 

2 

+ 

592 

2 

+ 

592 

2 

= 

1200 

3 

R10 

803 

3 

R14 

© 

1118 

3 

Z 

1211 

4 

* 

818 

4 

4 

© 

5 

R13 

1085 

6 


PREC(=)  <  PREC(*) 
R13*R10"^R14 


PREC(=)  v.  PREC(+5 
R14  +  R7~~R1£ 


"TERM"  CASE 
Z  =  R15  RI6 


0  H  1270 

1  Rl6  1244 

2  © 


MUST 


MNEMONIC  TAG 

LDQ 

FMP 

STQ 

CLA 

FSB 

STO 

CLA 

FAD 

STO 

CLA 


Object  Program 


ADDRESS 

POOL  + 

14 

91 

POOL  ♦ 

15 

97 

POOL  + 

100 

107 

POOL 

164 

POOL  ♦ 

13 

170 

POOL  * 

iOi 

180 

POOL  * 

101 

216 

POOL  * 

100 

222 

POOL  * 

102 

232 

POOL  * 

11 

353 

C(Q>— MQ 
C(Q)*C(R)  — MQ 
MQ  —  R^ 

ZERO  — AC 
C{AC)  -  C(P)— AC 
AC-R2 
C(R2>— AC 
C(R2>  ♦  C(Rj)— AC 
C(AQ-R} 

C(L)  — AC 
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.  ‘CrX-Z- 


Object  Program  (Continued) 


FDP 

POOL  +  12 

STQ 

POOL  +  103 

LDQ 

POOL  +  9 

CLA 

POOL  +  10 

TSX 

1  EXF' 

STQ 

POOL  +  104 

CLA 

POOL  +  104 

FSB 

POOL  +  102 

STO 

POOL  +  105 

CLA 

POOL  +  105 

FDP 

POOL  +  102 

STQ 

POOL  +  106 

CLA 

POOL 

FAM 

POOL  +  8 

STO 

POOL  +  107 

LDQ 

FOOL  +  6 

CLA 

POOL  +  7 

TSX 

1  EXP 

STQ 

POOL  +  108 

CLA 

POOL  +  108 

FAD 

POOL  ♦  107 

STO 

POOL  +  109 

LDQ 

POOL  *  4 

FMP 

POOL  *  5 

STQ 

POOL  -  1 10 

LDQ 

POOL  *  2 

FMP 

POOL  -  3 

STQ 

POOL  '111 

CLA 

POOL  *  1  11 

359  C(L)/C(M)— MQ 
369  C(Q)— R4 
426  C(J)— MQ 
432  C(K)—  AC 

436  GO  TO  SUBROUTINE  "EXP 

446  RETURN  HERE  C(X1)  +  1 

483  C(R5)— AC 

489  C(R5)  -  C(R4)— AC 

499  C(AC) — *-R6 

550  C(R6)— AC 

556  C(R6)/C(R3)  — MQ 

566  C(MQ)— R? 

627  ZERO— AC 
633  |C(I)|  —  AC 
643  AC— R& 

702  C(E)  — MQ 
708  C(F)  — AC 

712  GO  TO  SUBROUTINE  "EXP 
722  RETURN  C(MQ)  —  Rg 

759  C{R?)  —  AC 

765  C(R9)  ♦  C(R8)—  AC 

775  C(  AC)  —  R  j  0 

889  C(C)  —  MQ 

895  C(C)*C(D)  —  MQ 

905  MQ  —  Rn 

991  C(A)  —  MQ 

997  C(A)*C(B)— MQ 

1007  MQ— R12 

1011  R.,— AC 
1  «• 
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MUST 

(cont) 


POOL 


Object  Program  (Continued) 


FAD 

POOL  + 

110 

1047 

C(R.2)  +  C(R  )  — AC 

STO 

POOL  + 

112 

1057 

AC  — Rn 

LDQ 

POOL  + 

112 

1108 

C(Rj3)—MQ 

FMP 

POOL  + 

109 

1114 

c(r13)*c»r10)— mq 

STQ 

POOL  + 

113 

1124 

MQ— R14 

CLA 

POOL  + 

113 

1158 

C(Rh)—  AC 

FAD 

POOL  + 

106 

1164 

C(Rh)  +  C(R?)— AC 

STO 

POOL  + 

114 

1174 

AC— R^ 

CLA 

POOL  + 

114 

1234 

mR15)-ac 

STO 

POOL  ♦ 

1 

1240 

C(AC)  — Z 

NOP 

POOL  + 

115 

1250 

Rf  +  1 

00 

00 

00 

00 

00 

00 

ZERO 

00 

00 

00 

00 

00 

00 

Reserved  for  Z 

00 

00 

00 

00 

00 

00 

Reserved  for  A 

00 

00 

00 

00 

00 

00 

Reserved  for  B 

00 

00 

00 

00 

00 

00 

Reserved  for  C 

00 

00 

00 

00 

00 

00 

Reserved  for  D 

00 

00 

00 

00 

00 

00 

Reserved  for  E 

00 

00 

00 

00 

00 

00 

Reserved  for  F 

00 

00 

00 

00 

00 

00 

Reserved  for  1 

00 

00 

00 

00 

00 

00 

Reserved  for  J 

00 

00 

00 

00 

00 

00 

Reserved  for  K 

00 

00 

00 

00 

00 

00 

Reserved  for  L 

00 

00 

00 

00 

00 

00 

Reserved  for  M 

00 

00 

00 

00 

00 

00 

Reserved  for  P 

00 

00 

00 

00 

00 

00 

Reserved  for  Q 

00 

00 

00 

00 

00 

00 

Reserved  for  R 
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00 

00 

00 

00 

00 

00 

Result  of  Q*R 

=  R1 

00 

00 

00 

00 

00 

00 

4-  P 

*  *2 

00 

00 

00 

00 

00 

00 

*Z  +  R1 

=  R3 

00 

00 

00 

00 

00 

00 

L/M 

*  R4 

00 

00 

00 

00 

00 

00 

JTK 

=  RS 

00 

00 

00 

00 

00 

00 

R5-R4 

=  R6 

00 

00 

00 

00 

00 

00 

VR3 

=  R7 

00 

00 

00 

00 

00 

00 

.ABS.  I 

^  R8 

00 

00 

00 

00 

00 

00 

EtF 

--  Rq 

00 

00 

00 

CO 

00 

00 

R9  +  R8 

=  R10 

00 

00 

00 

00 

00 

00 

OD 

•»„ 

00 

00 

00 

00 

00 

00 

A*B 

=  R12 

00 

00 

00 

00 

00 

00 

R12  +  R11 

=  R1 3 

00 

00 

00 

00 

00 

00 

R13*R10 

=  R14 

00 

00 

00 

00 

00 

00 

R14  +  R7 

s  R15 

5.  CONCLUSIONS 


When  comparing  tho  compilation  time  of  the  simplified  expressions  to  that  ot 
its  equivalent  complex  expression,  it  is  readily  seen  that  the  writing  of  com¬ 
plex  expressions  is  to  a  marked  advantage.  For  most  compilers,  however, 
this  is  not  the  case  and  the  programmer  can  usually  compile  simplified  ex¬ 
pressions  with  greater  speed.  Tho  compiler  sise  could  be  reduced  consider¬ 
ably  if  only  the  simplified  type  of  expressions  is  to  be  compiled  but,  of  course, 
this  would  mean  a  sacrifice  of  compilation  time.  If  the  generated  object  pro¬ 
gram  (Appendix  D)  is  examined,  it  is  readily  seen  that  this  is  not  an  optimum 
object  program.  To  generate  an  optimum  object  code,  additional  tests  could 
be  included  in  the  compiler  but  these  changes  would  increase  compilation  time. 

Many  of  today's  compilers  arc  designed  with  the  desire  of  generating  optimum 
object  program  in  mind.  In  order  to  accomplish  this,  the  substitution  expres¬ 
sions  are  scanned  several  times  looking  for: 
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1.  Repetitive  triples,  i.  e.  : 
x  =  a  +  b  , 
y  -  C  +  D*(a  +  b)  , 
where  a  +  b  is  the  repetitive  triple 
2..  Commutative  triples,  i.  e.  : 
x  =  a  +  b  , 
y  =  C  +  D^b  +  a)  , 
where  a  -1-  b  ~  b  +  a,  etc. 

3.  Redundant  object  code,  i.  e.  : 


STO  Rt 

AC 

—  R. 

X 

CLA  R. 

V 

—AC 

In  the  present  algorithm,  none  of  the  above  considerations  was  employed.  For 
large  programs  and  programs  that  are  developed  with  long-range  use  in  mind, 
the  above  features  should  be  considered  in  the  writing  of  a  compiler.  Such  a 
compiler  would  be  useful  in  a  production-type  computer  environment.  Such 
features  sometimes  would  result  in  long  compilation  times  to  accomplish  short 
object  program  execution  times.  If  the  object  program  is  to  be  used  only  once 
or  twice,  it  sometimes  becomes  absurd  to  use  as  much  as  twice  the  object  pro¬ 
gram  execution  time  in  order  to  accomplish  a  compilation. 

The  accumulative  compilation  time  for  compiling  the  expression 

Z  *  (A*B  -  C*D)*(£TF  *  .ABS.  I)  *  { J T K  -  L/M)/(-P  ♦  Q*R) 

was  found  to  be  2.  37  msec  and  for  an  equivalent  set  of  simplified  expressions 
(using  a  more  general  subroutine),  3.27  msec.  The  algorithm  consists  of 
184.  IBM  7090  instructions  and  *> *4  locations  for  constants  and  working  sto.*- 
age. 
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1.  INTRODUCTION 

This  is  the  programming  report  for  the  compilation  problem  programmed 
for  Machine  II  as  a  portion  of  the  work  performed  under  Contract  AF30- 
(602)-355Q.  The  compilation  problem  was  the  programming  of  a  portion 
of  the  Michigan  Algorithm  Decoder  (Mi*.D).  The  section  of  MAD  chosen 
for  demonstration  was  the  compilation  of  substitution  statements.  Sub¬ 
stitution  statements  are  composed  of  variables  and  operators  whose  values 
are  substituted  or  made  equal  to  some  variable. 

When  grouped,  the  elements  of  the  statement  fall  into  sets  of  triples,  two 
operands  and  an  operator,  that  can  be  used  to  generate  an  object  program 
for  machine  execution.  The  keys  to  statement  decomposition  and  object 
generation  are  operator  precedence,  the  order  of  execution  when  a  state¬ 
ment  is  composed  of  various  operators,  and  a  statement  scanner. 

Item  ?,  contains  a  discussion  of  the  programs.  Item  3  contains  results  of 
the  programming,  a  comparison  with  the  IBM  7090  program,  and  the  com¬ 
piler  flow  charts  and  programs.  Itrm  4  contains  a  discussion  of  the  ob¬ 
ject  program  generated  and  the  object  program. 

2.  DISCUSSION  OF  THE  PROGR'vM 
2.-  Statement 

The  statement  selected  for  demonstration  is 

Z  *  (A  •  B  ♦  C  •  D)  ♦  (E  t  F  ♦  ABS  I)  ♦ 

(J  t  K  -  L/M)/(NEG  P  ♦  Q  •  R) 
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Each  variable  is  represented  as  a  *  ingle  alphabetic  character,  al¬ 
though  a  maximum  of  six  are  recognised  in  the  MAD  translator.  It 
is  assumed  that  the  decoding  process  has  been  completed  and  a  table 
(the  L  list)  generated  vith  single-word  entries  corresponding  to  each 
variable  and  each  operator  in  the  statement.  The  entries  in  the  L 
list  are  in  the  same  order  as  the  elements  in  the  substitution  state¬ 
ment.  The  L  list  is  scanned  from  left  to  right  and,  depending  on  the 
tests  that  are  satisfied,  an  output  list  (the  P  list)  and  a  temporary 
list  (S  list,  or  stack)  are  generated. 

Operands  are  transferred  immediately  from  the  L  list  to  the  P  list. 
Operators  are  transferred  to  the  top  of  the  S  list  if  their  precedence 
is  equal  to  or  greater  than  the  current  operator  on  the  top  of  this  list. 
Termination  and  grouping  symbols  require  special  handling.  Left 
parentheses  are  unconditionally  put  on  the  S  list.  Right  parentheses 
cause  the  removal  of  ail  elements  from  the  S  list  and  transfer  to  the 
P  list  with  the  parentheses,  both  right  and  left,  then  removed.  The 
right  termination  symbol  causes  the  transfer  of  all  remaining  oper¬ 
ators  in  the  S  list  to  the  P  list. 

The  substitution  statement  is  assumed  to  have  less  than  256  elements 
so  that  it  can  be  considered  as  occupying  at  most  one  block  of  memory. 

The  object  program  also  is  assumed  to  require  at  most  one  block  of 
memory.  These  restrictions  are  not  necessarily  fixed  but  could  be 
removed  with  minor  programming  changes. 

When  an  operator  is  added  to  the  P  list,  the  preceding  two  operands 
with  the  operator  are  sent  to  a  generator  program  to  produce  a  seg¬ 
ment  of  the  object  program.  The  resultant,  R.,  is  then  entered  in 
the  P  list  and  the  scanning  continued. 

Table  XUl-l  shows  the  L  list.  Table  XI1I-2  shows  the  P  list  status 
and  the  compiled  triples. 
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A  flow  chart  depicting  the  process  of  starting  a  generator  program 
is  shown  in  Figure  XIII- 1  {the  illustrations  begin  on  Page  335).  The 
*  routines  have  fumd  that  an  operator  is  to  be  transferred  to  the  P 
list.  When  the  transfer  has  been  accomplished,  an  instruction  starts 
the  SP  block;  this  in  turn  starts  either  SPU  or  SPB.  When  the  required 
data  have  been  transferred  to  the  generator  programs,  Gl,  G2,  G'\  or 
G4  BRING  instructions  are  executed  in  SP  and  G*.  I*.  or  K*,  enabling 
starting  of  subsequent  blocks  of  the  program.  The  BRING  instructions 
request  a  result  that  is  generated  in  a  block  started  by  the  block  in 
which  the  BRING  resides.  The  BRING  is  executed  when  the  result  is 
generated.  Hence,  a  block  is  not  completely  executed  until  the 
BRING  is  executed  If  subsequent  STARTs  are  dependent  on  a  BRING, 
there  will  be  a  delay  in  them  until  execution  of  the  BRING. 

b.  Assumptions 

A  variable  in  the  L  list  or  P  list  has  the  format  shown  below. 


i 

INDEX 

V 

ADDRESS 

The  address  of  the  whole  word  as  it  is  contained  in  a  block  is  i.  Index 
is  a  {winter  to  a  list  location  containing  the  symbol;:  name  of  the  vari¬ 
able.  V  is  a  bit.  equal  to  a  one,  indicating  that  the  -/ord  corresponds 
to  a  variable.  Addree*  the  memory  address  of  the  variable  that  is 
assumed  as  *ed  after  L  list  generation. 

An  operator  in  the  L.  S.  or  P  list  Ha*  the  format  shown  below. 


The  address  of  the  whole  word  a*  it  appears  in  a  block  is  i.  The 
precedence  of  the  operator  ;*  contained  in  the  PRIX  field.  V  is  a 
bit  equal  to  0.  indicating  thai  the  element  is  an  operator.  Code  is 
the  operation  code. 
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The  operations  considered  were  floating  point  arithmetic,  absolute 
value,  exponentiation,  and  equality. 

c.  Program  Techniques 

The  program  is  a  left-to- right  statement  scanner  that  allows  gen¬ 
eration  of  triples  as  soon  as  an  operation  can  be  transfi  -red  from 
the  S  list  to  the  P  list.  In  the  flow  diagram  of  Figure  XIII-2,  the 
ABLE  and  CHARLIE  loop  transfers  operands  from  the  L  list  to  the 
P  list.  If  an  element  is  an  operator  and  its  precedence  is  equal  to 
or  greater  than  the  top  element  of  the  S  list,  the  S  list  is  pushed 
down  and  the  element  i3  put  on  top  of  the  S  list.  If  the  operator  has 
less  precedence  than  the  top  element  of  the  S  list,  then  the  top  ele¬ 
ment  of  the  S  list  is  put  on  the  P  list.  When  an  operator  is  placed 
on  the  P  list,  a  generator  program  is  started  that  produces  the  cor¬ 
responding  portion  of  the  object  program  using  this  operator  and  the 
two  preceding  operands  on  the  P  list.  A  resultant  address  is  calcu¬ 
lated  and  entered  in  the  P  list  as  a  variable. 

The  detection  of  a  left  parenthesis  results  in  the  transfer ral  of  the 
element  to  the  top  of  the  stack.  A  right  parenthesis  causes  all  op¬ 
erators  up  to  the  first  left  parenthesis  on  the  S  list  to  be  transferred, 
in  order,  to  the  P  list.  Again,  each  operator  transferred  to  the  P 
list  Has  a  corresponding  generator  program  started  that  produces  a 
portion  of  the  object  program  corresponding  to  the  operator  trans¬ 
ferred.  The  parentheses  are  thei  droppe  from  the  S  list  and  are 
not  transfe.  red  to  the  P  list. 

When  a  right  termination  is  detected,  all  remaining  operators  >n  the 
S  list  Me  transferred  to  the  P  list  in  order  For  each  operator 
transferred,  a  corresponding  generator  program  is  started  to  pru- 
dvice  *  portion  of  the  object  program 

A  triple  is  detected  each  time  an  operator  t*  transferred  from  the 
S  list  to  the  1*  list.  The  operand*  associated  with  the  operator  and 
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comprising  the  triple  are  immediately  ahead  of  the  operator  on  the 
P  list.  The  resultant  address  is  calculated  as  the  first  word  address 
of  the  generated  object  program  segment.  This  resultanc  address  is 
then  placed  in  the  P  list  as  a  variable.  It  will  in  turn  be  considered 
as  an  element  of  some  triple. 

The  generator  programs  produce  a  segment  of  the  object  program. 
The  resultant  is  always  left  in  the  first  word  of  the  object  program 
segment  or,  in  the  case  of  a  double  precision  operation,  in  the  first 
two  words  of  the  segment.  The  initial  address  of  a  segment  is  cal¬ 
culated  easily  by  considering  the  maximum  size  of  all  possible  object 
program  segments.  In  this  example,  the  size  is  indicated  in  the  pro¬ 
gram  as  ,  The  index  of  the  operator  in  the  L  list  is  used  to  de¬ 
termine  i  ($£)  ,  which  is  then  added  to  the  object  program  base  address 
and  results  in  OBI  4-  I  ®  ,  the  initial  address  of  the  object  program 
segment  and  the  location  of  the  resultant  of  the  triple. 

Parallelism  exists  both  within  any  block  of  the  program  and  between 
blocks  of  the  program  and  the  generator  programs.  The  algorithm 
is  sequential  in  nature;  attempts  to  scan  the  substitution  statement 
in  parallel  results  in  invalid  intermediate  forms  of  the  statement. 
Attempts  to  correct  this  uncovered  a  processor  control  problem. 
Figure  XIII- 1  depicts  the  sequence  t-f  a  program  started  when  an 
operator  is  detected.  The  solid  lines  indicate  the  sequence  executed 
for  a  block  start.  The  dashed  lines  indicate  the  path  followed  while 
the  initiating  block  is  waiting  fo.  an  answer  from  the  initiated  block. 
The  initiated  block  in  turn  tests  the  operator  and  starts  other  blocks. 

The  G*  G'  loop  in  Figure  XIII- 1  starts  the  SP  block  which  tests  the 
operate*  t  .  determine  if  it  is  a  unary  or  binary  operator.  If  it  is 
unary,  the  SPU  block  is  started  and  a  return  to  the  G  is  accomplished 
by  the  BRING  instruction  ir>  G.  If  the  operator  is  binary,  the  SPB 
block  is  started.  The  SPB  block  tests  the  operator  and  if  it  is  not 
an  exponentiation,  the  GENC  1  block  is  started;  if  it  is  exponentiation, 


-330- 


APPENDIX  XIII 


the  GENO  3  is  started.  The  SPB  block  is  tested  also  by  the  BRING 
instruction  in  G  and  when  it  is  satisfied  the  GJ  loop  is  reinitiated. 

Some  characteristics  of  Machine  II  make  it  a  desirable  machine.  The 
storage  of  results  in  the  word  occupied  by  the  generating  instruction 
simplifies  programming  in  that  no  explicit  store  instruction  is  re* 
quired  for  MPC  storage.  However,  a  store  instruction  is  required 
for  a  memory  store  operation.  The  variety  of  conditional  starts  is 
necessary  to  enable  branching  from  a  block.  To  execute  a  two-way 
branch  fro-n  a  block,  two  conditionals  are  required  in  that  block,  both 
testing  the  ,Ame  word.  In  some  respects,  this  is  undesirable  in  that 
extra  instructions  are  required  over  what  would  be  required  in  a  se¬ 
quential  machine.  However,  no  extra  time  is  required  because  the 
two  tests  are  performed  in  parallel  as  soon  as  the  test  word  is  avail¬ 
able. 

The  ability  to  acquire  data  from  neighboring  blocks  is  necessary  for 
program  continuity.  Data  from  a  previous  ulock  can  be  addressed 
relative  to  the  previous  starting  instruction  or  by  absolute  block  ad¬ 
dress.  Data  from  a  subsequent  block  can  be  retrieved  by  absolute 
address  within  the  started  block.  A  BRING  and  an  M  WAIT  instruc¬ 
tion  for  a  block  to  be  started  should  not  be  used  in  the  same  block. 

The  M  WAIT  instruction  waits  until  all  instructions  have  been  exe¬ 
cuted  before  starting  a  new  block  while  the  BRING  attempts  to  bring 
back  a  piece  of  data  generated  by  that  block.  However,  a  conditional 
or  an  unconditional  start  and  a  BRING  instruction  are  compatible. 

Data  in  memory  can  be  found  with  the  READ  MEMORY,  READ 
MEMORY  INDIRECT,  and  THRESHOLD  instructions.  The  READ 
MEMORY  is  an  absolute  address  instruction,  the  READ  INDIRECT 
permits  an  indexed  access,  and  the  THRESHOLD  instruction  permits 
a  next-higher-thm  threshold  search 

A  disadvantage  is  the  quantity  of  data  that  must  be  passed  from  one 
block  to  another.  In  the  problem,  as  many  as  8  to  10  words  passed 
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through  a  series  of  blocks.  Timewise,  this  was  not  detrimental  since 
all  the  data  were  passed  in  parallel.  However,  each  transfer  re¬ 
quired  an  instruction.  In  some  cases,  where  a  word  was  being  used 
as  an  index,  the  shift  instruction  is  placed  elsewhere  in  the  program 
and  the  index  augmented  with  the  result  left  in  its  proper  relative 
location  ready  for  tiansferral  to  the  next  block.  Considerable  time 
is  spent  in  laying  out  data  transfers,  especially  when  attempting  to 
implement  a  program  loop. 

The  instruction-erase  option  allows  the  programmer  to  maintain  a 
minimum  of  MPC  storage.  As  results  are  used,  they  can  be  erased, 
allowing  MPC  sforage  of  other  data.  In  the  problem,  however,  it 
was  found  that  because  of  the  amount  cf  branching  and  looping  the 
task  of  erasing  data  or  maintaining  data  and  letting  subsequent  blocks 
erase  it  where  there  was  alternate  subsequent  blocks,  became  diffi¬ 
cult  and  time  consuming.  One  way  to  surmount  the  problem  would 
be  to  alter  the  erase  instructions  so  that  the  instruction  would  not  be 
executed  until  all  data  had  been  transferred  to  the  new  block.  Then 
the  previous  block  cDu’d  be  wiped  out  completely.  The  instruction 
would  be  a  combination  of  ERASE  and  a  modified  M  WAIT  where  the 
wait  wou'd  be  dependent  or.  the  transferral  of  all  data  requested  by 
the  new  block  of  urogram 

Progjamming  of  Machine  II  has  been  relatively  easy  in  some  re¬ 
spects  and  difficult  in  others,  it  is  easy  to  program  in  a  straight¬ 
forward  rarner  with  n<  atte.  ip.  at  fines  .e.  However,  the  lack  of 
indexing  seriouslv  hampers  any  attempt  to  reduce  too  size  of  a  pro¬ 
gram  by  using  loops.  The  ability  to  converse  between  blocks  using 
the  R HI F  !  and  BRING  instru mi  nr  partl>  so!  es  .he  difficulty,  but 
the  ne.  essity  t_»r  time  consuming  and  tedious  program  layout  of  data 
to  bo  transferred  between  blocks  still  remains  when  operation  is 
re siructed  to  the  MPC.  It  is  possible  that  even  this  difficulty  can 
be  removed  by  maintaining  these  d-c.a  in  memory,  but  here  again 
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the  tradeoff  sacrifices  the  speed  of  obtaining  a  result  from  the  MPC 
for  obtaining  a  result  from  the  memory.  The  former  is  obtained 
while  executing  an  instruction  requesting  the  result  while  the  latter 
requires  a  READ  MEMORY  instruction  and  then  an  operating  instruc¬ 
tion  to  acquire  the  same  result. 

3.  RESULTS  AND  COMPARISON 

The  results  looked  for  in  the  programming  portion  of  the  study  are  time 
to  execute  the  problem,  processors  used  during  execution,  ease  or  dif¬ 
ficulty  in  programming,  comparison  with  the  results  of  programming  a 
similar  problem  on  a  sequential  machine,  and  an  extrapolation  of  times 
for  the  two  machines  and  subsequent  comparison. 

The  algorithm  fimlly  used  in  the  problem  was  more  sequential  in  nature 
than  others  examined  but  did  not  display  the  control  problems  inherent  in 
some  of  the  others  when  programmed.  The  result  was  a  very  low  average 
load  on  the  processors.  The  average  loading  was  1.  17  processors  per 
machine  cycle  for  the  total  problem  execution  time.  Peak  loading  is  es¬ 
timated  to  be  no  more  than  10  processors  in  any  machine  cycle.  Another 
algorithm  or  a  slightly  longer  or  shorter  program  block  could  change  the 
peak,  either  increasing  it  or  leveling  it. 

The  time  required  to  execute  the  algorithm  for  the  translation  of  the  sub¬ 
stitution  statement  and  generation  of  the  corresponding  object  program 
was  53.  585  msec.  The  speed  ratio  of  the  sequential  machine  to  Machine 
II  is  N  to  27,  where  N  is  the  number  of  statements  to  be  compiled.  If  it 
is  assumed  that  there  is  more  than  one  statement  to  be  translated,  then 
the  sequential  machine  will  translate  faster  until  the  number  of  statements 
is  27.  The  fact  that  there  are  numerous  processors  and  that  the  proces¬ 
sor  loading  is  small  allows  Machine  II  to  process  numerous  statements  in 
parallel.  When  the  statement  loading  exceeds  27,  the  sequential  time  re¬ 
quired  for  translating  will  continue  to  increase  linearly  while  the  Machine 
II  time  will  remain  relatively  fixed  at  54  msec.  Assuming  availability  of 
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256  processors,  then  Machine  II  can  average  about  256/1.  17  =  219  state¬ 
ments  every  54  msec.  This  would  give  Machine  II  a  speed  advantage  over 
the  IBM  7090  of  approximately  219/27  or  8  to  1.  The  actual  time  required 
for  processing,  of  course,  depends  upon  the  sequence  of  operators  and  the 
amount  of  grouping  within  the  substitution  statement. 

4.  OBJECT  PROGRAM 

The  object  program  generated  is  shown  in  Table  XIH-4.  Segments  of  the 
object  program  block  are  generated  each  time  an  operator  is  detected  and 
transferred  to  the  P  list.  Each  segment  is  then  stored  in  memory  in  the 
block  assigned  to  the  object  program.  Addresses  within  the  block  are  de¬ 
termined  by  the  index  of  the  operand  in  the  L  list  and  the  predetermined 
maximum  size  of  any  object  program  segment. 

Some  variables  in  the  segments  reside  in  memory  and  some  reside  in  the 
MPC.  Each  generator  determines  where  a  variable  is  located  and  gener¬ 
ates  the  appropriate  instructions,  either  READ  MEMORY  or  SHIFT  THIS 
BLOCK.  Upon  execution,  the  variable  replaces  the  instruction  and  the 
triple  is  executed  with  the  resultant  then  occupying  the  first  two  words  of 
the  segment. 

The  object  program  can  be  executed  in  the  same  manner  as  the  translator. 
It  should  be  noted  that  the  execution  could  be  done  simultaneously  with 
translation  so  that  the  statement  resultant  would  be  available  only  shortly 
after  object  generation. 
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Figure  XU2*3  •  Able,  Baker ,  Charlie  Subroutines 


Figure  XU1*6  -  Dog,  KoaU,  SP,  SPU  Subroutine* 


*3  40. 
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SPH 


exp 


OENO  4  CiNO  * 


OCNO« 


OENOI 


Figure  Xi 11*7  •  SPB,  GENOi  Subroutine* 
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Figure  XIU-8  -  GF.N02.  GENOi,  GENQ4  Subroutines 
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TABLE  XIH-3  -  COMPILER  PROGRAM 


Itonri 

Inal  ruction 

Remo  rka 

Tima 

P 

R 

ABLE 

a 

SPB 

0 

1 

i  a  -  -  t 

6 

7 

+  1 

SPB 

0 

2 

j  a  -  -  j 

6 

7 

+2 

SPB 

0 

3 

k  a  -  -  k 

6 

7 

0 

SPB 

0 

4 

L  addreaa  a  -  L  - 

6 

7 

+  1 

SPB 

■0 

5 

PL  addreea  a  -  PL  - 

6 

7 

+2 

SPB 

0 

6 

S  addreaa  a  -  S 

6 

7 

43 

SPB 

0 

7 

OB  addreaa  a  -  OB  - 

6 

7 

Y+  1 

THS,  M 

a 

0 

(L  4  i) 

I0 

11 

42 

LOl 

>4  1 

VBIT 

14 

15 

43 

EQZ 

V4  2 

V4  3 

18 

44 

CONS 

BAKER 

OPERATOR 

- 

1 

45 

NEq 

V42 

y  ♦  6 

18 

- 

46 

CONS 

CHARLIE 

OPERAND 

- 

11 

47 

CONS 

VBIT 

1 

BAKER 

a 

SPB 

0 

a 

i 

6 

7 

4 1 

SPB 

0 

a  *  1 

j 

6 

7 

42 

SPB 

0 

a  +  Z 

k 

6 

7 

43 

SPB 

0 

a  +  3 

L 

6 

7 

44 

SPB 

0 

a  4  4 

PL 

6 

7 

45 

SPB 

0 

a  45 

S 

6 

7 

46 

SPB 

0 

0  4  6 

OB 

6 

7 

47 

SPB 

0 

\  +  1 

(L  4  i)  OPERATOR 

6 

7 

0 

L06 

rtf  +  7 

RTERM 

A  •  B 

10 

11 

+ 1 

EQZ 

P 

0  +  2 

0  if  RTERM,  I  otherwiaa 

14 

- 

42 

CONS 

DOG 

- 

1 

43 

L06 

a*  7 

LPARENS 

A  •  B 

10 

11 

44 

EQZ 

0 

(3  4  5 

0  if  LPARENS, 

1  otkerwiee 

H 

- 

45 

CONS 

EASY 

- 

1 

46 

L06 

a  ♦  7 

R PA BENS 

A  •  B 

10 

11 

47 

EQZ 

0 

0  4  8 

0  if  R PARENS, 

i  otharwiaa 

14 

* 

4« 

CONS 

FOX 

* 

1 

Y 

MPY 

0 

0  4  3 

14 

15 

♦  l 

MPY 

Y 

'  6 

II 

15 

42 

NEQ 

V  ♦  1 

V  *  ? 

ii  0  then  D  »  A 

♦  B  4  C 

22 

43 

CONS 

GEORGE 

ii  5  thin  0  *  ABC 

* 

1 

44 

CONS 

RTERM 

• 

1 

4» 

CONS 

LPARENS 

- 

1 

♦<> 

CONS 

R PARENS 

- 

\ 

CHARLIE 

0 

STB 

0 

Y  ♦  1 

l  *  1  4  i 

11 

• 

NOTE:  P  >«  the  cycU  Um«  altar  tha  block  alert  in  which  *  proceasor  U  active;  R  I*  the  eyrie  time  after  the  block  (tort 
in  which  th*  r»»ul<  ti  evallabl.- 
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TABLE  XUJ-3  -  COMPILER  PROGRAM  (Continued) 


lum 

Initruction 

Ram&rkt 

Tima 

P 

R 

CHARLIE  +1 

STB 

0 

0  +  1 

j  *  j  +  l 

17 

18 

(cont)  +2 

SPB 

0 

0  +  2 

k 

6 

7 

+3 

SPB 

0 

0+3 

L 

6 

7 

+4 

SPB 

0 

a  +  4 

PL 

6 

7 

+5 

SP3 

0 

a  ♦  5 

S 

6 

7 

♦6 

SPB 

0 

0  +  6 

OB 

6 

7 

a 

SPB 

0 

a  +  1 

j 

6 

7 

+i 

ADD 

a 

one 

j  *  j  ♦  1 

-  index  addraal 

21 

22 

+2 

spa 

0 

Y  +  1 

(L  +  i)  = 

OPND 

6 

7 

♦  3 

L07 

o  +  4 

0  +  1 

PL  + j  - 

-  PL  j 

25 

26 

+4 

STO 

0+2 

a  +  3 

29 

30 

Y 

SPB 

0 

a 

Gat  i 

6 

7 

+  1 

ADD 

Y 

one 

i  =  i  +  1 

10 

11 

+2 

MWT 

ABLE 

32 

- 

+3 

CONS 

ONE 

DOG  a 

SPB 

0 

0 

1 

6 

7 

+  1 

ADD 

a  +  & 

one 

J  j 

=  j  +  1 

10 

11 

+2 

ADD 

0  +  9 

ana 

k  k 

=  k  +  1 

10 

11 

+3 

SPB 

0 

0*3 

L 

6 

7 

+4 

SPB 

0 

0  +  4 

PL 

fc 

7 

+5 

SPB 

0 

a  *  5 

S 

b 

7 

♦6 

SPB 

0 

o  +  6 

OB 

6 

7 

♦7 

SPB 

0 

0  +  7 

(L  +  i)  - 

OPERATOR  x  RTERM 

6 

7 

+8 

SPB 

0 

a  +  l 

J 

6 

7 

+9 

SPB 

0 

0  +  2 

k 

6 

7 

*10 

THS 

0  +  9 

0  ♦  5 

(*  ♦  <0 

to 

11 

+  11 

LOb 

0  ♦  10 

NULL 

14 

16 

+  12 

EQZ 

0+11 

0+13 

1 

18 

- 

+  11 

CONS 

HALT 

END 

- 

1 

+  14 

NKQ 

0  +  i  1 

0+13 

18 

+  '3 

CONS 

KOALA 

t 

‘16 

CONS 

one 

, 

- 

1 

♦  17 

CONS 

null 

. 

1 

EASY  « 

STB 

0 

0  ♦  1 1 

.4 

[  14 

15 

+  1 

SPB 

0 

0  +  1 

j 

b 

7 

*2 

STB 

0 

0  +  l  i 

k 

14 

15 

+  ) 

5)  Pd 

1 

0  ♦  * 

L 

ft 

7 

•  4 

SPB 

a 

0  *  *> 

PL 

ft 

T 

SPB 

0 

0  ♦  s 

S 

b 

7 

♦  ft 

SPB 

0 

A  »  *> 

OB 

b 

7 

»T 

SPB 

0 

*  *  f 

(L  +  0  - 

OPEKAND  *  LPARCN5 

ft 

7 

+  * 

LOT 

A  ♦  * 

9  *  4 

•  ♦  ft 

II 

11 
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TABLE  XIII -3  -  COMPILER  PROGRAM  (Continued) 


Item 

I - 

|  Instruction 

Remark* 

Time 

P 

R 

EASY  +9 

STO 

a  +  7 

a  +  8 

(L  +  i)  -+  *  +  k  -  1 

22 

23 

(cont)  +1Q 

SPB 

0 

a  +  2 

k 

6 

7 

+  11 

SUB 

a  +  10 

one  i 

k  =  k  •  1 

10 

11 

+  12 

SPB 

0 

a  +  l 

i 

6 

7 

+  13 

ADD 

a  ♦  12 

one 

1  =  1  +  1 

10 

11 

+  14 

MWT 

ABLE 

! 

25 

- 

>15 

CONS 

ONE 

- 

1 

FOX  a 

SPB 

0 

a 

i 

6 

7 

+  1 

STB 

0 

a  +  io 

j 

14 

15 

+2 

STB 

0 

a  +  ll 

k 

18 

19 

+3 

SPB 

0 

0+  3 

L 

6 

7 

+4 

SPB 

0 

a  +  4 

PL 

6 

7 

+5 

SPB 

0 

a  +  5 

S 

6 

7 

+6 

SPB 

0 

a  +  6 

OB  i  PR  EG  v  OP  code 

6 

7 

+7 

SPB 

0 

a  +  7 

(L  +  1)  =  RPARENS 

6 

7 

+8 

SPB 

0 

a  +  l 

j 

6 

7 

+9 

SPB 

0 

a  +  2 

k 

6 

7 

♦  10 

ADD 

a  +  8 

one 

j  +  1 

10 

11 

+  11 

ADD 

a  +  9 

one 

k  +  1 

14 

1" 

+  12 

L07 

a  ♦  5 

a  +  9 

*  +  k  addreee 

10 

11 

+  13 

THS,  M 

0 

a  +  iz 

+  k)  -  PREC  9  OP  code 

!  14 

15 

+  14 

LOl 

a  ♦  13 

OP  MSK 

18 

19 

+  15 

1.06 

0+  14 

LPARENS  MSK 

22 

23 

+  16 

EQZ 

a  +  IS 

a  +  17 

LPARENS  YES 

26 

- 

+  17 

CONS 

HALO 

- 

1 

+  18 

NEQ 

0+15 

8+19 

26 

- 

+  19 

CONS 

IPSWICH 

- 

1 

+20 

CONS 

on* 

l 

* 

1 

+21 

CONS 

OP  MSK  i 

. 

- 

1 

+22 

CONS 

LPARENS  MSK 

- 

1 

GEORGE  a 

SPB 

0 

0 

i 

fe 

7 

+  1 

ADD 

a  •  is 

one 

j » i  ♦  i 

10 

11 

+2 

ADD 

a  +  14 

one 

k  *  k  +  1 

10  ! 

!  u 

♦  S 

SPB 

0 

«  »  3  I 

L 

1 

*  1 

y 

♦4 

SPB 

0 

a  *  4  j 

PL 

fc  i 

7 

*5 

SPB 

0 

a  ♦  > 

& 

+  ! 

7 

SPB 

0 

a  *  * 

i  !v 

Y 

>8 

SPB 

0 

0  *  t 

It.  •  t)  OPERATOR 

1  * 

•» 

♦  ? 

LOl 

•  *  • 

PR  EC  MSK 

OPERATOR  PREC 

1  ,, 

1  1 

♦9 

T.'fS 

4  •  14 

0  *•  s 

(*  *  k) 

io  ! 

!  >* 

+  10 

LOl 

a  •  i 

PR  EC  MSK 

(»  •  k!  o^ttioi  PREC  TOP 

I  >4 

15 

+  11 

SUB 

a  •  J 

0  ♦  10 

OP  PREC  •  (S  -  kl  OP  PREC 

!  IT  i 

1  14 
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TABLE  XIII-3  -  COMPILER  PROGRAM  (Continued 


0  4  10 

a  ♦  13 
SPU 
a  4  is 
SPB 
a  4  14 
a  4  u 
a  4  17 
OP  MSK 
ABS  MSK 
NEC  MSK 


SPB  a  4  14 
SPB  a  4 11 


21  22 
2S  26 


THS,  M 
THS,  M 
THS.  M 


(PL  4  j) 

N(PL  4  j) 

N(N(PL  4  j» 

TAB  OP  CODE  CENO  Addr«*» 


26  27 


22  23 

33  34 


a  4  14 


START  A  GEN  PROG 


OB  ♦  i# 


PL  *  j 
P.i  ■*  PL  ♦  J 


10  11 
22  23 


(PL  -  jl 

N(PI-  »  j!  B  op*>»*4 
N(N(PL  •  jl)  A  ap^rtnd 
OB  *  ,•  -OB  !*• 

■  ■OP  Co4» 

r- 

novmx 
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TABLE  XIII-3  -  COMPILER  PROGRAM  (Continued) 


Itam 

Instruction 

Remarks 

Tima 

P 

R 

GENOl 

+7 

ADD 

a  4  3 

one 

OB  ♦  i*  +  1 

10 

11 

(cont) 

+8 

ADD 

ft  ♦  3 

two 

+  2 

10 

11 

♦9 

ADD 

a  ♦  3 

thrte 

+  3 

10 

11 

rlO 

THS 

a  +  4 

♦  M 

14 

15 

+  11 

CONS 

OCT 

- 

1 

+  12 

START 

CIO 

B 

3 

* 

+  13 

START 

CIO 

A 

3 

i  - 

+  14 

BRG 

a  +  12 

6 

B  RDM  or  STB 

29 

30 

+  15 

BRG 

a  +  13 

6 

A  RDM  or  STB 

29 

30 

+  16 

LOl 

a  4  ! 

ADDMSK 

addraaa  of  B  operand 

10 

11 

+  17 

LOl 

a  +  2 

ADDMSK 

address  of  A  operand 

10 

11 

+  18 

L07 

a  +  14 

a  +  16 

RDM  or  STB  B 

33 

34 

+  19 

L07 

a  +  15 

a  +  17 

RDM  or  STB  A 

33 

34 

+20 

LOl 

a  +  8 

IB  MSK 

i*+  for  A 

14 

15 

+21 

LOl 

a  +  9 

16MSK 

i#+  for  B 

14 

15 

+22 

STB 

V2 

a  +  20 

18 

19 

+  13 

L07 

a  +  22 

a  +  21 

22 

23 

+24 

L07 

a  +  5 

a  +  23 

F  -  A  addrsstes  B 

26 

27 

+25 

STO 

a  +  24 

a  ♦  3 

30 

31 

+26 

STO 

a  +  6 

a  +  7 

22 

23 

*27 

STO 

a  +  19 

c  +  8 

37 

38 

+28 

STO 

a  +  18 

a  ♦  9 

37 

38 

+29 

CONS 

one 

- 

1 

+30 

CONS 

two 

- 

J 

+  31 

CONS 

thres 

- 

1 

♦  32 

CONS 

ADDMSK 

- 

1 

+  33 

CONS 

1  •  MSK 

- 

1 

+  34 

CONS 

EOMSK 

- 

1 

G10 

Q 

SPR 

0 

-li 

t 

■; 

+i 

LO! 

0 

INOEXMSK 

10 

11 

£QZ 

a  + 

a  ♦  5 

14 

- 

♦  3 

NEG 

a  '  1 

a  »  4 

14 

*4 

CONS 

GU 

- 

1 

*5 

CONS 

G12 

( 

♦  b 

BPC 

0  +  2 

i 

! 

II 

23 

+  7 

CONS 

INDEX  MSK 

i 

1 

Cli 

0 

RDM 

Ro 

Rl  *  RDM 

16 

17 

GU 

a 

RDM 

SI 

SI  .  STE 

16 

IT 

GEN02 

o 

spn 

0 

a  *  » 

fPi.  - ),  opera: oh 

6 

7 

•  1 

SPB 

0 

a  *  io 

NjPL  •  J)  OPERAND 

6 

7 

♦2 

SPB 

0 

a  ■■  is 

OB  *  i  • 

6 

7 

*1 

5F« 

0 

O  4  11 

OP  CODE 

6 

7 
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TABLE  XIII- 3  -  COMPILER  PROGRAM.  (Continued) 


It*  in 

r 

I 

lnatruetion 

- - - - - -i 

Remark* 

r 

Time 

P 

R 

GEN02 

+4 

R'.M 

0 

0*11 

1.06  or  L 1 0 

14 

15 

(coot) 

+5 

RIM 

1 

0*15 

L06  1  r  L.  10 

14 

15 

+6 

RIM 

2 

0  +  14 

STB  22  3  or  CONS  0 

14 

15 

t7 

ADD 

a  +  2 

one 

OB  +  »•  ♦  1 

10 

1  \ 

+8 

ADD 

a  ♦  2 

two 

♦  2 

10 

11 

♦9 

ADD 

♦  2 

three 

♦  3 

10 

11 

+  10 

ADD 

a  ♦  2 

l  CUT 

4-  4 

10 

11 

+  11 

THS 

Of  >  4 

0+12 

ABS  or  NEC. 

10 

11 

+  12 

CONS 

OCT 

- 

1 

♦  13 

START 

G20 

3 

- 

+  14 

BRG 

a  +  13 

fc 

RDM  or  STB 

38 

39 

+  14* 

BRG 

a  ♦  13 

7 

RDM  or  STB 

38 

39 

♦  16 

lot 

0*3 

ABSMSK 

13 

14 

+  17 

EQZ 

0+16 

a  ♦  19 

17 

- 

+  18 

N  KQ 

o+16 

0  +  20 

17 

- 

♦  19 

CONS 

ABS 

- 

1 

+20 

CONS 

NEC 

- 

l 

+21 

BRG 

0+17 

4 

33 

34 

+22 

BRG 

0+17 

5 

33 

34 

+23 

BRG 

a  +  17 

0 

33 

34 

+24 

STO 

a  ♦  39 

a  *  2 

44 

45 

♦25 

STO 

0+30 

0  +  7 

44 

45 

♦26 

STO 

0  +  21 

0  +  8 

37 

38 

+27 

STO 

0+14 

0+9 

« 

41 

♦28 

STO 

0  ♦  14i 

0  *  10 

4  c 

43 

+29 

LO~ 

a  ♦  4 

a  ♦  22 

to 

41 

♦  30 

Lor 

o  ♦  0 

a  *  23 

40 

41 

♦  31 

CONS 

or* 

1 

*  32 

CONS 

two 

' 

l 

*13 

CONS 

three 

1 

♦  34 

CONS 

fau  r 

» 

+  38 

CONS 

abs  msk 

1 

G20 

a 

SPB 

0 

6  »  \ 

OPER  AND 

6 

? 

»i 

l.Ol 

a 

INtH  XMSK 

10 

11 

‘2 

EG  7. 

a  •  i 

a  *  3 

:« 

15 

♦  1 

NEU 

o  ■  1 

O  •  9 

14 

15 

♦  4 

CONS 

G2  1 

• 

1 

•  ^ 

CONS 

(..’2 

1 

13  R  C 

a  *  .* 

l 

RDM  or  S'.  R  A 

24 

28 

V  ? 

add 

u  *  <■ 

.me 

RDM  or  STB  A  •  1 

>; 

*4 

i  ONS 

IMl  MNK 

1 

*4 

CONS 

*ihf 

• 

4 
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TABLE  XIH-3  -  COMPILER  PROGRAM  (Continued) 


Item 

r 

Instruction 

— 

Remirk* 

Tim# 

P 

R 

G21  a 

ROM 

RI 

Rl  ROM  - 

17 

18 

G22  0 

PPM 

SI 

SI  STB  - 

17 

18 

spu  a 

SPS 

0 

a 

i 

6 

7 

♦  l 

SPB 

0 

a  4  ! 

j 

6 

7 

+2 

SPB 

0 

0+2 

k 

6 

7 

+  3 

SPB 

0 

0+3 

L 

6 

7 

+4 

SPB 

0 

a  ♦  4 

PL 

6 

7 

♦  5 

SPB 

0 

a  ♦  5 

S 

6 

7 

+6 

SPB 

0 

0  +  6 

OB 

6 

7 

+7 

1.01 

a  ♦  9 

im»k 

14 

15 

+8 

STB 

?.  16 

a  ♦  7 

18 

19 

*9 

THS,  M 

Q  ♦  1 

0  +  4 

(PL  +  j) 

10 

11 

+  10 

THS.  M 

a  ♦  8 

a  ♦  4 

n(PL  ♦  j) 

22 

23 

♦  11 

THS 

a  +  9 

e  +  12 

OP  CODE  GENC  »ddr*«* 

14 

15 

♦  1?.  ' 

CONS 

TAB 

- 

1 

+  13 

MWT 

0+11 

St»rt  A  C  ENERATOR  PROG 

18 

- 

♦  34 

MPY 

a 

• 

10 

11 

+  15 

L07 

o  +  6 

0+14 

OB  +  >0 

14 

15 

+  16 

L.  07 

o  ♦  15 

VBIT 

18 

19 

+  1? 

LC7 

a  ♦  1 

PL  ♦  ; 

10 

11 

♦  38 

STO 

0+16 

0+17 

Ri  -  PL  ♦  j 

22 

23 

+  19 

CONS 

im»k 

- 

1 

+20 

CONS 

VBIT 

- 

1 

GEN03  0 

SPB 

0 

0  +  I  ) 

(PL  ♦  j) 

6 

7 

♦  1 

SPB 

C 

0+12 

N(PI,  *  .*)  B 

6 

7 

+2 

SPB 

0 

0+13 

N(N(PL  +  .-)>  A 

6 

7 

♦  3 

SPB 

0 

0*18 

OB  ♦  »• 

6 

7 

♦4 

SPB 

208 

0+14 

OP  Cod* 

6 

7 

*5 

RIM 

0 

a  *  H) 

PRC  1 

14 

13 

RIM 

1 

0+10 

BUG  2 

14 

15 

♦  7 

RIM 

2 

0  +  10 

DOt'B 

14 

iS 

RIM 

3 

0*10 

mw  r 

14 

15 

♦  4 

RIM 

4 

0  ♦  ^9 

CONS 

14 

13 

•  10 

THS 

0  ♦  4 

0  ♦  1  1 

10 

n 

•  11 

CONS 

OCT 

• 

♦  12 

START 

0  •  1) 

t 

♦  1 » 

CONS 

CSI 

• 

•  14 

PRC 

o  +  U 

4 

PRC  •  •  4  1 

20 

21 

♦14 

PRC. 

»  •  12 

4 

PRC  i*  ♦  6  2 

20 

21 

•  Ik 

START 

•  •  II 

3 

. 

•1* 

START 

•  •  11 

• 

•i» 

CONS 

(,«0 
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TABLE  XIII -3  -  COMPILER  PROGRAM  (Continued) 


Itam 

(abstraction 

Kammrka 

Tima 

P 

R 

CENOJ  +19 

BRG 

a  +  lb 

6 

B  RDM  or  STB 

73 

34 

(cont)  +?0 

BRG 

a  ♦  17 

6 

A  RDM  or  STB 

33 

34 

+21 

LOl 

a  +  i 

ADD  M5K 

10 

11 

+22 

LOl 

a  ♦  2 

ADDMSK 

10 

M 

+23 

L07 

o+19 

9  +  21 

RDM  13  A  add  or  STB  B  add 

37 

38 

+24 

L07 

0  +  20 

0  +  22 

RDM  A  add  or  STB  A  add 

37 

38 

+25 

ADD 

cr  ♦  3 

one 

OB  +  IX  ♦  1 

10 

11 

+26 

ADD 

0  +  3 

two 

+  2 

10 

11 

+27 

ADD 

a  ♦  3 

three 

i  3 

10 

i  * 

+7.8 

ADD 

0+3 

four 

♦  4 

10 

ii 

+29 

ADD 

0  ♦  3 

five 

+  5 

10 

ii 

♦  30 

ADD 

0+3 

fix 

+  6 

:0 

u 

♦31 

ADD 

0+3 

3*ven 

+  7 

10 

u 

♦  32 

STO 

0+14 

a  ♦  3 

BRG  i  43  +  6  ) 

24 

25 

♦  33 

STO 

a  +  15 

0  +  25 

BRG  i  •  +  b  2 

24 

25 

♦  34 

sro 

0  +  24 

0  +  26 

RDM  oj  STB  A 

41 

42 

+35 

STO 

0+  7 

0  +  27 

DOUBLE 

18 

19 

+  36 

STO 

a  +  23 

0  +  28 

RDM  or  STB  B 

41 

42 

+37 

STO 

0  +  7 

0  +  29 

DOUBLE 

18 

19 

♦  38 

STO 

o  +  8 

u  ♦  30 

MWT 

10 

19 

+39 

STO 

o  +  9 

0+31 

CONS  EXP  ROUT 

18 

19 

+40 

CONS 

ADD  MSK 

1 

♦41 

CONS 

one 

1 

*42 

CONS 

two 

1 

♦43 

CONS 

thras 

1 

♦44 

CONS 

four 

1 

+  45 

CONS 

fiva 

1 

CONS 

ata 

1 

♦47 

CONS 

■even 

1 

Gil  a 

SPB 

0 

0*5 

BRG  l 

i> 

7 

♦  l 

SPB 

0 

9  ♦  t 

BRG  2 

6 

7 

*2 

SPB 

88 

0  «  22 

OB  +  !•  +  6 

6 

7 

«) 

STB 

16 

0*2 

■  >•  *  6  - 

3 

4 

LQ7 

a 

9  *  * 

BRG  a  •  • «  1 

10 

11 

*5 

r 

o 

•c 

«  *  l 

9  *  ' 

BRG  t*  +  6  2 

10 

H 

033  « 

SPR 

0 

•14 

A  or  B  oparand 

i 

7 

•  1 

•  1.01 

0 

1NDMSK 

10 

m 

•2 

1X32. 

9  *  \ 

9  fe  4 

14 

*  t 

NEQ 

« »  t 

9  *  4. 

14 

•  4 

CONS 

Oil 

• 

i 

.4 

CONS 

C.12 

• 

1 

><» 

BRC, 

9  *  1 

1 

2 1 

£4 

352 
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TABLE  XIH-3  -  COMPILER  PROGRAM  (Continued)  1 

I 


Item 

Instruction 

Remark# 

Tima 

P 

R 

Gil 

0 

ROM 

RI 

RI  RDM  - 

16 

17 

GU 

a 

ROM 

SI 

SI  STB  - 

16 

17 

GEN 04 

a 

SPB 

0 

0+11 

(PL  +  J> 

6 

7 

♦  I 

SPB 

0 

0  4  12 

N(PL  ♦  J)  B  ope  rend 

6 

7 

+2 

SPB 

0 

0  4  11 

N(N(PL  4  j)  A  op* Tend 

6 

7 

41 

SPB 

0 

0+28 

OB  +  ix  OB  lx 

6 

7 

44 

SPB 

208 

0+14 

OP  code 

6 

7 

4* 

RIM 

0 

0+10 

STO 

14 

lb 

♦6 

RIM 

1 

a  +  io 

DOUBLE 

le 

is 

*7 

ADO 

a  ♦  3 

one 

OB  4  1«  4  1 

10 

ii 

48 

ADO 

a  ♦  3 

two 

4  2 

10 

li 

♦9 

AOO 

a  ♦  3 

three 

4  1 

10 

u 

4  10 

YHS 

0+4 

0+21 

10 

n 

*11 

CONS 

OCT 

- 

i 

412 

LOl 

0*2 

ADDMSK 

A  addreee 

10 

n 

411 

LC1 

0  +  1 

ADDMSK 

B  addreae 

10 

n 

414 

STB 

72 

0  +  S 

STO  OB  +  1  •  4  2  OB  +  1  •  ♦  1 

14 

ib 

♦  lb 

L07 

0+11 

0+14 

18 

19 

*U 

STO 

0*14 

0+3 

STO  1*42  i  •  ♦  J 

18 

19 

417 

STO 

0  ♦  * 

0  4  7 

DOUBLE 

18 

19 

418 

STO 

0  ♦  12 

o  +  8 

A  addreae 

14 

15 

♦19 

STO 

0+11 

0  +  9 

B  eddraaa 

14 

lb 

♦20 

CONS 

one 

421 

CONS 

two 

+22 

CONS 

three 

*21 

CONS 

ADDMSK 

ABS 

SPB 

72 

0  +  J 

6 

7 

♦  1 

SPB 

0 

0  +  9 

-  -  .  OB  i  •  1 

6 

T 

*2 

SPB 

0 

0  ♦  10 

•  •  -  08  1*4 

6 

7 

♦  3 

SPB 

0 

0  +  6 

STB  221 

6 

7 

♦  4 

LOl 

0  ♦  l 

0  ♦  1 

STB  22 1  i  *  ♦  J 

10 

1! 

*b 

•-07 

<3 

9  *  i 

1C  *  2  i*  ♦  1 

10 

11 

♦6 

LOT 

0 

0  +  2 

I|<1  *•  *  4 

10 

II 

N  EG 

• 

3PH 

72 

■S  ‘  « 

1  •  ‘  1 

r 

sp» 

72 

e  »  10 

1  •  *  4 

r 

•2 

sri* 

0 

<5  4  * 

>•  4  t 

* 

? 

S.'B 

0 

9  4  10 

»•  *  4 

. 

h 

-» 

*4 

5*11 

0 

0  *  * 

C  QNS  9 

r 

t.O’ 

9 

o  *  < 

t*  •  t  t»  •  » 

10 

It 

*«* 

U>* 

a  *  \ 

•  •  ) 

»»  *  4  l«  •  ♦ 

to 

H 

OCT 

i 

r*o 

»•  •  * 

i»  *  i 

2 

tXHBLE 

% 

•>SJ* 


•» 


i. 


A 
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TA3LE  XIII -3  -  COMPILER  PROGRAM  (Continued) 


Item 

Inetructior. 

1 

OCT  3 

F5U 

i®  ♦  2 

(coot)  4 

DOUBLE 

5 

FDV 

i«  +  2 

itt  +  3 

6 

DOUBLE 

7 

FMP 

1®  +  2 

i®  +  3 

< 

DOUBLE 

r 

STC 

f  ®  ♦  2 

i®  +  3 

EQUAL 

1C 

DOUBLE 

a 

LOS 

IX  +  2 

iX  +  3 

ABS 

♦  l 

LOS 

:x  +  2 

iX  +  4 

+2 

STB 

223 

f  3 

I 

+3 

A  addreaa 

+4 

A  +  1  addreaa 

ff 

L1C 

IX  *  3 

iX  +  3 

NEC 

»1 

L10 

iX  +  3 

iX  +  4 

♦2 

CONS 

0 

+3 

A  addreaa 

+4 

A  t  1  addreaa 

a 

BRG 

i*  ♦  6 

1 

EXP 

+  1 

BRC 

i®  ♦  6 

2 

+2 

RDM 

A  addreaa 

■*3 

DOUBLE 

♦4 

RDM 

B  addreaa 

>5 

DOUBLE 

MATT 

♦7 

CONS 

EXP  ROUT 

- 

Remark* 


Time 
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•TABLE  XIII-4  -  OBJECT  PROGRAM  (Continued) 


Addrsii 

Instruction 

: 

Remark* 

+4 

RDM 

K 

+5 

Double 

+6 

MWT 

24®  +  7 

+7 

CONS 

Exprout 

OB  +  28® 

FDV 

28®  +  2 

28®  +  3 

L/M 

+1 

Doublo 

RDM 

L 

+3 

RDM 

M 

OB  +  260 

FSU 

26®  -i  2 

26®  3 

J  t  K  -  L/M 

+1 

Doublo 

+2 

STB 

0 

24® 

+3 

STB 

0 

28® 

Ob  +  23® 

L10 

23®  +  3 

23®  +  3 

NEG  P 

+  1 

L10 

230  +  4 

23®  +  < 

+2 

CONS 

0 

+3 

RDM 

P 

+4 

RDM 

P  +  1 

OB  +  37® 

PMP 

27®  +  2 

37®  +  3 

Q  •  R 

+  1 

Double 

+2 

RDM 

Q 

+  3 

RDM 

R 

OB  +  36® 

FAD 

35®  +  2 

35®  +  3 

NEG  P  +  Q  •  R 

+  1 

Double 

+2 

STB 

0 

2  3® 

+3 

STB 

0 

37® 

OB  +  31® 

FDV 

31®  +  2 

31®  +  3 

(i  t  K  •  L/M '/(NEG  P  +  Q  •  Rj 

+  1 

Double 

+2 

SYR 

0 

2b® 

+  3 

STB 

0 

35® 

OB  +  21® 

FAD 

21®  +  2 

21®  *  > 

(A  *  B+C  'DIMEtrt  ABS  I) 

+  1 

Double 

+2 

STB 

0 

U® 

U  t  K  -  L/M)/ (NEG  P  *  0  ‘  R) 

♦  1 

STB 

C 

31® 

OB  *  2® 

STO 

2®  ♦  2 

2®  *  3 

+  !  I 

Double 

CONS 

2i« 

+  3 

CONS 

Z 

\ 
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APPENDIX  XIV  -  PROGRAMMING  MANUAL  FOR  MACHINE  II 


1 .  INTRODUCTION 

The  Machine  I  parallel  processor  described  in  Appendix  VI  has  some  dis¬ 
advantages,  While  many  tasks  could  be  run  concurrently,  each  task  is 
sequential  and  communication  between  tasks  is  difficult. 

In  Machine  II,  each  task  (instruction  block)  can  have  concurrently  oper¬ 
ating  instructions  and  communication  between  tasks  is  better.  Machine  I 
has  the  advantage  of  better  machine  utilization  since  a  programmer  can  au¬ 
tomatically  introduce  concurrency  without  spending  time  setting  up  new 
tasks.  Machine  fl  has  the  advantage  of  transferring  results  between  tasks 
without  memory  references. 

Instructions  generally  consist  of  an  operation  code  and  two  operand  fields. 
When  a  task  is  started,  any  instruction  in  the  task  will  be  performed  when 
its  operands  are  available;  thus  many  instructions  in  a  task  could  be  exe¬ 
cuted  simultaneously. 

2.  BRIEF  DESCRIPTION  OF  MACHINE 

Machine  II  consists  of  a  multiaccess  merging -separating  memory  (see 
Appendices  VI  and  VII)  connected  to  I/O  devices  and  a  multiprocessor  con¬ 
trol  unit  (MPC)  as  diagrammed  in  Figure  XIV.  1.  The  MPC  stores  a  large 
set  of  instructions  (cn  the  order  of  1000)  and  fetches  their  operands  and 
ieeds  them  to  processors  for  execution.  The  number  of  processors  may 
be  in  the  hundreds.  The  channel  between  the  MPC  and  the  memory  is 
large  enough  to  permit  the  transfer  of  1024  words  at  one  time. 

1.  WORD  FORMATS 

in  memory,  a  word  consists  of  a  24  bit  address  and  a  52-bit  data  field. 


-357- 


APPENDIX  XIV 


Figure  XIV- 1  -  Block  Diagram  of  Machine  II 


Programming  is  done  with  1 6 -bit  addresses.  Up  to  256  different  pro¬ 
grams  may  be  running  concurrently,  each  with  its  own  protected  set  of 
65,  536  addresses.  A  program  cannot  reference  another's  address  ex¬ 
cept  through  the  monitor.  A  given  address  may  be  empty,  contain  one 
word,  or  contain  several  words.  When  more  than  one  word  is  at  the 
same  address,  they  are  arranged  in  order  of  their  contents.  A  normal 
reference  to  an  address  will  provide  the  word  whose  contents  is  the  least 
of  all  words  at  the  address.  A  special  threshold  memory  reference  al¬ 
lows  the  retrieval  of  the  least  word  at  an  address  whose  contents  are  not 
below  a  specified  threshold.  This  allows  instant  retrieval  of  an  item  in  a 
table. 

31  *  1 

An  integer  is  in  the  rang'j  1-2  to  2  -  1  and  is  written  in  normal 

ONEs -complement  form  (sign  bit  is  0  for  positive  numbers.  1  for  nega¬ 
tive  numbers).  Several  integers  at  the  same  address  will  be  ordered 
with  positive  integers  first  in  increasing  order  then  negative  integers  in 
decreasing  order  of  magnitude.  Certain  integers  are  shown  below: 
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Integer  Representation 


->•31 

u 

-  1 

0111 

nil 

mi 

mi 

nil 

nn 

1111 

1111 

2  31 

-  2 

0111 

mi 

ini 

nil 

mi 

nn 

nn 

1110 

2 

0000 

0000 

0000 

0000 

0000 

0000 

0000 

0010 

1 

0000 

0000 

0000 

0000 

0000 

0000 

0000 

0001 

+0 

0000 

0000 

0000 

0000 

0000 

0000 

0000 

0000 

-1 

mi 

nil 

mi 

mi 

nn 

nn 

nn 

1110 

-2 

mi 

nil 

mi 

ini 

mi 

nn 

nn 

1101 

2  - 

231 

1000 

0000 

0000 

0000 

0000 

0000 

0000 

0001 

1  - 

231 

1000 

0000 

0000 

0000 

0000 

0000 

0000 

0000 

A  floating-point  number  consists  of  a  fraction  sign,  an  1 1 -bit  exponent 
(the  fraction  is  multiplied  by  any  power  of  2  between  2  and  2*^3) 

and  a  20 -bit  fraction.  A  double -length  floating-point  number  has  a  52 -bit 
fraction.  A  positive  floating-point  number  has  a  fraction  sign  of  0  and 
the  exponent  is  biased;  for  example,  0  represents  2~  *  The  fraction  is 
in  the  range  -1  to  1.  A  negative  floating-point  number  is  formed  by  com¬ 
plementing  every  bit  (sign,  exponent,  and  fraction).  Certain  single-length 
floating-point  numbers  are  shown  on  the.  next  page.  Note  that  with  this 
representation  a  table  of  positive  normalized  floating-point  numbers  can 
be  put  in  order  simply  by  putting  their  representations  in  order;  this  for¬ 
mat  allows  threshold  searches  on  positive  normalized  floating-point  num¬ 
bers.  Negative  normalized  floating-point  numbers  will  be  put  in  descend¬ 
ing  order. 

Instructions  are  read  into  the  MPC  in  blocks  of  from  1  to  256  instructions 
apiece.  An  instruction  block  is  stored  at  one  address.  The  instructioh 
format  is: 


'  — — ’ 

Number 

Operation  code 

A 

B 

(3  bits) 

(8  bits) 

(8  bits) 

(8  bits) 
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Number  Representation  (single -length) 


05x  21023 

0111 

mi 

nil 

1000 

0000 

0000 

0000 

0000 

0.  5  X  21 

0100 

0000 

0001 

1000 

0000 

0000 

0000 

0000 

0.5  X  2° 

0100 

0000 

0000 

1000 

0000 

0000 

0000 

0000 

0.5  X  2"1 

0011 

ini 

nn 

1000 

0000 

0000 

0000 

0000 

0.5  X  2‘1024 

0000 

0000 

0000 

1000 

0000 

0000 

0000 

0000 

0 

0000 

0000 

0000 

0000 

0000 

0000 

0000 

0000 

-0.5  X  2“1024 

11 11 

nil 

nn 

0111 

nn 

1111 

nn 

nil 

-0.5  X  2 “ 1 

1100 

0000 

ocoo 

0111 

1111 

1111 

nn 

nn 

-0.5  X  2° 

1011 

nil 

nil 

011  1 

1111 

1111 

nn 

nn 

-0.5  X  2 1 

1011 

nil 

1110 

0111 

1111 

1111 

nn 

nil 

-0.5  X  21023 

1000 

0000 

0000 

0111 

nn 

nn 

nn 

nn 

The  number  identifies  the  instruction  in  the  block;  each  instruction  in  a 
block  is  given  a  unique  number  from  0  to  255.  The  operation  code  identi¬ 
fies  the  operation  while  A  and  B  usually  identify  operands. 

An  instruction  block  is  read  into  the  MPC  in  one  piece  and  the  instructions 
in  it  may  be  executed  in  any  order  (any  instruction  is  ex?r  ited  whenever 
the  requisite  number  of  operands  are  available),  The  operands  for  in¬ 
structions  may  be  memory  words,  results  of  other  instructions  in  the 
same  block,  results  of  instructions  in  the  block  that  caused  a  block  to  be 
read  into  the  MPC,  or  results  of  instructions  in  any  block  that  a  block 
causes  to  be  read  Into  the  MPC.  Several  instruction  blocks  may  be  in  the 
MPC  at  one  time. 

4.  OPERATIONS 
a.  General 


Each  operation  specifies  one  or  two  operands  and  has  options  that  allow 
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erasure  of  its  operands.  When  the  result  of  an  operation  is  being  used  by 
several  other  operations,  one  and  only  one  of  the  other  operations  should 
erase  it,  as  discussed  under  Item  4,  i  below. 

b.  Arithmetic  and  Logic 

(1)  Operands 

The  operands  in  any  of  these  operations  are  the  results  of  the  operations 
numbered  a  and  b  in  the  same  block  as  the  operation. 

(2)  Fixed  Point 

ADD  Result  is  (a)  r  (b) 

SUB  Result  is  (a)  -  (b) 

MPY  Result  is  (a)  *  (b) 

DVD  Result  is  (a)  /  (b) 

MOD  Result  is  (a)  Mod  (b) 

Arithmetic  is  done  module  23^  -  1.  Negative  zeros  are  never  generated. 

The  operation  ADD,  A  is  similar  to  ADD  except  that  the  result  in  (a)  is 
erased.  Tins  also  is  true  for  ADD,  B  and  ADD,  AB  and  all  other  fixed- 
point  operations.  There  are  a  total  of  20  fixed-point  operations.  Over¬ 
flows  will  be  flagged. 

(c)  Floating  Point 

FAD  (a)  +  (b) 

FSU  (a)  -  (b) 

FMP  (a)  *  (b) 

FDV  (a)  /  (b> 

Operands  and  results  are  norma!  single -length  floating-po>nt  numbers. 
Each  operation  has  three  erase  op’ions  indicated,  for  example.  by  FAD, 
A;  FAD,  B;  FAD,  AB,  which  cause  erasure  of  {*).  ot  (b)  or  of  (a)  and  (b). 
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respectively.  There  are  a  total  of  16  floating-point  operations.  Over¬ 
flows  will  be  flagged. 

(4)  Double -Length  Floating-Point 

Any  of  the  four  floating-point  operations  may  be  made  double -length  by 
putting  the  operation  DOUBLE  in  the  instruction  following  the  single  - 
length  operation;  if  DOUBLE  is  numbered  nil,  the  single-length  oper¬ 
ation  is  numbered  n.  The  continuation  of  the  double -length  result  is 
numbered  the  same  as  the  double  operation.  As  an  example,  the  pro¬ 
gram 

100  FAD,  A  3  6 

101  DOUBLE  0  0 

would  treat  the  results  numbered  3  and  4  as  the  A  operant  ,  the  results 
numbered  6  and  7  as  the  B  operand,  and  would  number  the  resultant  sum 
as  100  and  101.  The  erase  A  option  causes  erasure  of  the  results  num¬ 
bered  3  and  4.  There  is  one  double  operation. 

(5)  Logic 

The  logic  operations  combine  the  corresponding  bits  of  A  and  B  with  any 
of  the  16  possible  Boolean  functions  of  two  variables.  They  are  listed  at 
the  top  of  the  next  page. 

Each  of  these  operations  has  erase  options;  for  example,  LOO,  A  or  LOO, 

B  or  LOO,  AB,  which  cause  erasure  of  A,  of  B,  or  of  A  and  B,  respectively. 
There  are  64  logic  operations. 

(6)  Conversions 

FXFL  (Fixed-to-Floating)  •  The  fixed  point  result  numbered  a  is  converted 
to  floating  point.  Field  b  is  unused.  If  this  operation  precedes  a  double 
operation,  a  double-length  floating  point  number  is  created.  FXFL,  A 
erases  the  result  numbered  a. 

FLFX  (Floating-to-Fixed)  •  The  floating-point  result  numbered  a  is 
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LOO 

0 

(zeros) 

L01 

(A) 

A 

(B) 

(and) 

L02 

(A) 

A 

(B) 

L03 

(A) 

L04 

(*) 

A 

(B) 

L05 

(B) 

L06 

(A) 

6 

(B) 

(exclusive  or) 

L07 

(A) 

V 

(B) 

(or) 

L10 

(A) 

A 

(B) 

LI  1 

(A) 

• 

(B) 

L12 

(5) 

LI  3 

(A) 

V 

(B) 

L14 

(A) 

LI  5 

(A) 

v 

(B) 

L16 

(A) 

V 

(P) 

LIT 

1 

(ones) 

converted  to  fixed-point.  Overflow  is  flagged.  FLFX,  A  erases  the 
operand  a. 

DLFX  (Double -Lengih  Floating-to-Fixed)  •  The  double-length  result 
numbered  a  and  a  +  l  is  converted  to  fixed-point.  Overflow  is  flagged. 
DLFX.  A  erases  operands  a  and  a  +  1. 

(7)  Conclusions 

In  each  of  these  oporations,  the  operands  are  results  of  other  instruc¬ 
tions  in  the  same  block  and  the  result  is  left  numbered  the  same  as  the 
instruction  that  caused  it. 
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c.  Shift 


In  the  shift  operations,  the  a  field  contains  a  shift  constant  in  the  range 
0  to  255.  A  shift  constant  in  the  range  0  to  63  means  a  left-end-around 
shift  of  from  0  to  63  places.  A  shift  constant  in  the  range  64  to  127 
means  a  left  end-off-  shift  of  from  0  to  63  places  (zeros  are  written  into 
the  right  side  of  the  result).  A  shift  constant  in  the  range  128  to  191 
means  a  left  end-off  shift  of  0  to  63  places  (ones  are  written  into  the  right 
side  of  the  result).  A  shift  constant  in  the  range  192  to  255  means  a  right 
end-off  shift  (the  sign  bit  is  written  into  the  left  side  of  the  result). 


In  summary: 

0  -  a  -  63 
64  i  a  ?  127 
128  -  a  -  191 
192  -  a  -  255 


Shift  left  end-around  a  places, 

Shift  left  end-off  a-64  places  (write  zeros) 
Shift  left  end-off  a-128  places  (write  ones) 
Shift  right  end-off  a-192  places  (write  sign) 


The  operand  is  specified  in  the  b  field.  It  is  always  left  alone  or  erased 
and  the  result  of  the  shift  instruction  is  the  shifted  operand.  For  the  STB 
operation,  the  operand  is  in  the  same  block  as  the  instruction  (Shift  This 
Block).  For  the  SPB  operation,  the  operand  is  numbered  b  in  the  previous 
block  (Shift  Previous  Block).  (The  previous  block  is  that  block  that  con¬ 
tained  the  start  instruction  that  started  this  block.)  For  the  SPR  oper¬ 
ation,  the  operand  is  in  the  previous  block  b  places  relative  i:o  the  start 
instruction  that  started  this  olock  (Shift  Previous  Relative)  (b  is  added  to 
the  number  of  the  start  instruction  modulo  256),  The  STB,  B;  SPB,  B; 
and  SPR.  B  operations  cause  erasure  of  the  operand. 


Any  shift  may  be  made  a  double -length  shift  by  following  the  instruction 
with  a  DOUBLE  operation.  As  an  example,  the  program 


4  3  SPR.  B  70  13 

44  DOUBLE  0  0 


would  take  the  results  ul  the  instruction.-,  m  the  previous  block,  which  are 


-  364- 


APPENDIX  XIV 


the  13th  and  14th  instructions  following  the  start  instruction,  erase  them, 
put  them  together  in  a  double -length  word  (the  13th  result  to  the  left), 
shift  them  left  end-off  6  places  (6  places  on  the  left  are  lost,  6  place*  in 
the  right  half  travel  to  the  left  half  and  6  zeros  are  written  in  the  right 
half),  then  number  the  resultant  halves  as  43  and  44  in  this  block. 

In  all  left  shift  operations,  the  result  is  flagged  if  overflow  occurs. 

d.  Bring 

The  shift  operations  SPB  and  SPR  allow  a  block  to  retrieve  an  item  from 
the  block  that  started  iv.  The  BRING  operation  (BRG)  allows  a  block  to 
retrieve  an  item  from  a  block  it  starts.  The  a  field  identifies  the  block 
by  specifying  the  start  instruction  in  this  h’.ck  which  started  the  block 
containing  the  operand.  The  b  field  specifies  the  number  of  the  result 
in  that  block.  The  result  of  the  BRG  is  the  particular  operand.  BRG,  B 
erases  the  operand  in  the  started  block. 

i'iexd  a  of  a  BRG  or  BRG,  B  operation  may  refer  to  a  conditional  start  in¬ 
struction  that  was  not  satisfied.  In  this  case,  the  first  start  instruction  or 
satisfied  conditional  start  instruction  whose  number  : »  above  a  is  selected 
and  reference  made  to  the  block  it  started.  If  no  such  start  exists,  an  in¬ 
terrupt  occurs. 

e .  Memory  References 

(1)  RDM  (Read  Memory) 

This  has  as  a  result  the  word  in  memory  whose  address  is  (a.  b)  (a  and  b 
are  read  as  one  lb-bit  field  to  form  the  memory  address).  If  the.'  *  is 
more  than  one  word  at  the  address,  the  one  whose  contents  is  least  is 
chosen.  RDM,  M  erases  the  word  in  memory. 

(2)  RIM  (Read  Indirect  Memory) 

The  lo  right-most  bits  of  the  result  b  in  this  block  is  used  as  the  memory 
address  after  being  incremented  by  an  amount  from  -128  to  12?.  The  in¬ 
crement  is  in  the  a  (  rid  of  this  instruction.  RIM,  B  erases  the  base 
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memory  address,  RIM,  M  erases  the  word  in  memory,  and  RIM,  BM 
erases  both. 

(3)  THS  (Threshold  Search) 

This  has  as  a  result  the  word  in  memory  in  a  given  address  whose  con¬ 
tents  are  just  above  a  threshold.  The  memory  address  is  specified  by  the 
right-most  16  bits  of  the  result  in  this  block  numbered  fc.  The  threshold 
is  the  result  numbered  a. 

THS,  M  erases  the  memory  word,  THS,  A  erases  the  result  a  in  this 
block,  and  THS,  5  erases  the  result  b  in  this  block.  THS,  AB;  THS, 

AM;  THS,  ABM;  and  THS,  BM  erase  the  indicated  combinations  of  these 
items . 

In  THS,  lack  of  the  desired  memory  word  reuses  a  search  in  the  higher 
memory  addresses  with  the  retrieval  of  the  memory  word  whose  contents 
is  the  least  in  the  first  nonempty  memory  address.  This  item  will  be 
erased  if  the  memory  erase  option  is  specified.  Care  shoulo  be  exer¬ 
cised  to  prevent  unwanted  erasures.  The  result  of  a  THS  is  flagged  if  it 
comes  from  an  address  different  from  the  specified  address. 

(4)  STO  (Store) 

This  causes  the  storing  of  a  result  into  memory.  The  memory  address 
is  the  right-most  16  bit*  of  the  result  numbered  b.  The  result  to  be  stored 
is  that  numbered  a.  STO,  A  erases  the  a-reiult,  STO,  B  erases  the  b- 
result,  and  STO,  AB  erase*  both.  The  store  operations  themselves  have 
no  results.  No  memory  words  are  over  written;  any  words  at  the  speci¬ 
fied  memory  address  will  be  aept  and  the  new  memory  word  added  to  that 
address. 

i.  Instruction  Block  Starts 
(1)  START 

This  causes  an  instruction  block  to  fee  put  in  ths  MPC  for  execution.  The 
memory  address  ot  the  block  is  specified  by  the  right-most  16  bile  of  the 
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result  numbered  b  in  this  block.  START  B  is  similar  to  START  except 
that  the  result  numbered  b  is  erased 

(2)  Conditional  Starts 

The  condition  of  a  result  in  &  block  can  be  used  to  start  another  block  with 
the  same  priority.  The  a  field  specifies  the  result  to  be  tested.  The 
memory  address  of  the  block  t.  be  started  is  contained  in  the  right  -most 
16  bits  of  the  result  numbered  b,  The  conditional  starts  are: 

GTZ  star  r,  (a)  >  0 
LTZ  start  if  (a)  <  0 
GTE  start  if  (a)  ^  0 
LTE  start  if  (a)  4  0 
EQZ  start  if  (a)  =  0 
NEQ  start  if  (a)  ^  0 
FLG  start  if  (a)  is  flagged 
UNF  start  if  (a)  is  unflagged 

Each  conditional  start  has  erase  options:  for  example,  GTZ,  A  erases 
result  a,  GTZ,  B  erases  result  b,  and  GTZ,  AB  erases  both.  These 
erasures  occur  whether  the  condition  is  satisfied  or  not. 

(3)  SUFER  (Supervisor) 

This  operation  brings  a  supervisory  routine  into  the  MFC.  Tne  a  and  b 
fields  specify  which  routine.  The  routine  may  expect  to  obtain  certain 
parameters  from  results  stored  relative  to  the  SUPER  instruction,  Su¬ 
pervisory  routines  are  allowed  certain  privileged  operations  denied  nor¬ 
mal  routines. 

(4)  M  WAIT 

This  is  an  unconditional  start  instruction  except  that  it  is  not  executed 
until  all  memory  operations  {RDM,  RIM,  THS,  STO)  in  this  block  have 
been  executed. 
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J8L'  EPB  (Erase  PreviPus  Block 

This  causes  all  results  in  the  previous  block  between  and  including  a  and 
b  to  be  erased.  EPR  (Erase  Previous  Block  Relative)  is  similar  except 
a  and  b  are  relative  to  the  start  instruction  that  started  this  block. 

h.  CONST  Constant 

The  operation  CONST  ha 9  as  a  result  the  number  stored  in  the  a  and  b 
fields.  Sixteen  zeros  are  inserted  in  the  left-most  places. 

i.  Erasures 

If  a  result  is  used  by  only  one  operation,  that  operation  should  erase  it. 

If  used  more  than  once,  then  it  should  be  erased  by  (IV  the  highest  num¬ 
bered  operation  in  the  lowest  priority  block  started  by  the  highest  num¬ 
bered  3tart  instruction  that  uses  it;  or  if  no  started  block  uses  it,  then 
(2)  the  highest  numbered  operation  in  the  same  block  that  uses  it.  or  if  no 
operation  in  the  block  uses  it,  then  (3)  the  highest  numbered  operation  in 
the  previous  block  that  uses  it, 

5.  EXAMPLE  PROGRAMS 
a.  POLY 

POLY  evaluates  a  fourth-degree  polynomial  in  X  using  double -length 

floating-point  arithmetic.  The  coefficients  Cq.  Cj,  c,,  c^>  c^  are  stored 

in  POIA  +  i  through  POLY  +  iO.  The  variable  X  is  obtained  from  the 

calling  block  The  emry  should  leave  the  left-half  of  X  two  results  ahead 

of  the  START  POLY  instruction  and  the  right-half  of  X  one  result  ahead. 

2  3  4 

POLY  erases  X  and  leaves  Cr,  +  c.X  +  c-X  +  c,X  ♦>  c.X  in  results  1 
and  2. 

In  Figure  XIV-2.  Instructions  3  through  14  are  executed  first  to  obtain 
the  coefficient*  and  X.  Instructions  3  and  4  erase  X  in  the  calling  pro¬ 
gram.  Instructions  13  through  20  are  executed  next.  On  the  third  step, 
Instructions  21  through  28  are  executed.  On  the  tourth  step.  Instructions 
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ADDRESS 


NO. 


OR  A  ■ 


POLY 

1 

PAD,  AB 

29 

SI 

2 

OO’JBLE 

mrmm 

0 

S 

s  pr  ,  a 

mam 

Mil 

4 

5  PR .  B 

° 

mm 

s 

ROM 

POLY  +  1 

e 

ROM 

POLY  1  2 

mm 

RDM 

POLY  K  9 

Si 

RDM 

POLY  +  4 

ROM 

POLY  <■  8 

10 

RDM 

POLY ♦  9 

1 1 

RDM 

POLY  +  7 

12 

RDM 

POLY  +  B 

1* 

RDM 

POLY  +  9 

14 

RDM 

POLY  +  10 

IS 

FMP 

S 

3 

16 

DOUBLE 

0 

0 

17 

FMP,  A 

7 

s 

18 

DOUBLE 

0 

c 

19 

FMP.  A 

IS 

s 

20 

DOUBLE 

0 

0 

21 

FAD,  AB 

s 

n 

?* 

DOUBLE 

0 

'i 

21 

FMP,  A 

— 

is 

24 

DOUBLE 

S3 

0 

2S 

FMP,  AB 

m 

IS 

26 

DOUBLE 

mam 

0 

27 

KAO,  AB 

mnmm 

19, 

28 

OOUBLE 

mm 

29 

FAD,  AB 

*  _i _ 

SO 

DOUBLE 

0 

o 

SI 

FMP.  AB 

25 

27 

92 

DOUBLE 

C 

C 

POLY  4  1 

C 

a 

POLY  ♦  2 

ROLY ♦  S 

c 

1 

POLY  ♦  4 

POLY ♦  9 

c 

* 

POLY  ♦  • 

POLY  *  1 

c 

s 

POLY  ♦  • 

POLY  *  9 

c 

4 

POLY  *  10 
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29  through  32  are  executed.  On  the  fifth  step,  Instruction*  1  and  2  are 
executed. 

b.  TREE 

TREE  (see  Figure  XIV-3)  is  a  program  that  will  cause  N  executions  of  a 
program  specified  as  PROG,  each  execution  with  a  different  index  I,  I  = 

1,  2,  ....  N.  PROG  should  be  written  to  expect  I  one  location  ahead 
of  its  start  instruction  and  it  should  erase  I. 

TREE  is  entered  with 

a  -  2:  N 
a  -  1:  PROG 

a :  START  TREE 

and  it  erases  N  and  PROG. 

TREE  consists  of  five  instruction  blocks,  TREE,  TREE  +  1,  .  .  .  , 

TREE  +  4,  and  33  instructions. 

6.  CONCLUSIONS 

The  normal  operations  of  a  multiprocessor  design  have  been  described. 
There  will  also  be  other  operations  for  use  by  the  monitor.  This  ma¬ 
chine  has  the  advantage  of  having  a  machine  language  wherein  parallel 
operations  can  be  expressed  and  executed  easily  and  communication  be¬ 
tween  concurrently  operating  portions  of  the  programs  can  be  accomplished. 

7.  OPERATIONS  THAT  LEAVE  A  RESULT 
a.  Fixed  Point 


ADD 

ADD,  A 

ADD, 

B 

ADD, 

AB 

SUB 

SUB,  A 

SUB, 

B 

SUB, 

AB 

MPY 

MPY.  A 

MPY, 

B 

MPY, 

AB 

DVD 

DVD,  A 

DVD, 

B 

DVD, 

AB 
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MOD 

MOD,  A 

MOD,  B 

MOD,  AB 

FLFX 

FLFX,  A 

DLFX 

DLFX,  A 

Floating  Point  (Can  Be  Double 

Length) 

FAD 

FAD,  A 

FAD,  B 

FAD,  AB 

FSU 

FSU,  A 

FSU,  B 

FSU,  AB 

FMP 

FMP,  A 

FMP,  B 

FMP,  AB 

FDV 

FDV,  A 

FDV,  B 

FDV,  AB 

FXFL 

FXFL,  A 

Logic 

LOO 

LOO,  A 

LOO,  B 

LOO,  AB 

L01 

L01,  A 

L01,  B 

L01,  AB 

LC2 

L02,  A 

L02,  B 

L02,  AB 

L03 

L03,  A 

L03,  B 

L03,  AB 

L04 

L04,  A 

L04,  B 

L04,  AB 

L05 

LOS,  A 

L05,  B 

LOS,  AB 

L06 

L06,  A 

L06,  B 

L06,  AB 

L07 

L07,  A 

L07,  B 

L07,  AB 

LI  0 

LI 0,  A 

L.10,  B 

LI 0,  AB 

LI  1 

LI  1 ,  A 

LI  1,  B 

Li  1 ,  AB 

LI  2 

LI 2,  A 

L12,  3 

L12,  AB 

LI  3 

LI  3,  A 

LI  3,  B 

L13,  AB 

LI  4 

LI  4,  A 

LI  4,  B 

LI 4,  AB 

LI  5 

LI  5,  A 

LI  5,  B 

LI  5,  AB 

LI  6 

LI  6,  A 

LI  6,  B 

L16,  'AB 

L 1  7 

LI  7,  A 

LI  7.  B 

LI  ?,  AB 

Shift  (Can  Be 

Double  Length) 

STB 

STB,  B 

SPB 

SPB,  B 

SPR 

SPR.  B 
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e.  Bring 


£. 


Special 


Memory 

RDM 
RIM 
THS 
THS,  M 


BRG  BRG,  B 


DOUBLE 

CONST 


RDM,  M 
RIM,  M 
THS,  A 
THS,  AM 


RIM,  B 
THS,  B 
THS,  BM 


8.  OPERATIONS  THAT  LEAVE  NO  RESULT 

a.  Erases 

b.  Store 


STO 

c.  Starts 

START 

G  iZ 

LTZ 

GTE 

LTE 

EQZ 

NEQ 

FLG 

UNF 


EPB  EPR 

STO,  A  STO,  B 

SUPER  M  WAIT 

GTZ,  A  GTZ,  B 

LTZ,  A  LTZ,  B 

GTE,  A  GTE.  B 

LTE,  A  LTE,  B 

EQZ,  A  EQZ,  B 

NEQ,  A  NEQ,  B 

FLG.  A  FLG,  B 

UNF.  A  UNF,  B 


RIM,  BM 
THS,  AB 
THS,  ABM 


STO,  AB 

START,  B 
GTZ,  AB 
LTZ,  AB 
GTE,  AB 
LTE.  AB 
EQZ,  AB 
NEQ,  AB 
FLG,  AB 
UNF,  AB 
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APPENDIX  XV  -  BASIC  ORGANIZATION  OF  MACHINE  II 


1.  INTRODUCTION 

Appendix  VI  describes  a  parallel  processor  organization  referred  to  as 
Machine  I.  Machine  U  was  designed  to  have  a  more  dynamic  processor 
assignment  scheme,  automatic  concurrency  within  tasks  as  well  as  con¬ 
current  tasks,  and  a  multiprogramming  dynamic  priority  capability.  Ap¬ 
pendix  XIV  describes  the  machine  language  and  programming  considera¬ 
tions;  this  appendix  describes  the  hardware  implementation. 

2.  GENERAL  DESCRIPTION 

Figure  XV-1  is  a  block  diagram  of  Machine  II.  The  memory  is  a  multi¬ 
access  parallel  merging -separating  memory  (see  Appendixes  VI  and  VII) 


Figure  XV-1  •  Block  Diagram  of  Machine  U 
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with  many  (on  the  order  of  1000)  parallel  channels  to  the  multiprocessor 
control.  It  is  needed  as  a  store  capable  of  reading  and  writing  many  items 
of  data  simultaneously  so  that  the  machine  is  not  memory-limited. 

The  I/O  devices  consist  of  backup  memories  (core  storage,  disk  storage, 
drum  storage,  tape  storage,  etc,  )  and  normal  I/O  units  (card  equipment, 
printers,  consoles,  channels  to  other  machines,  etc,  ).  The  number  of 
l/O  channels  can  be  in  the  hundreds  and  all  channels  may  be  operating  at 
the  same  time  to  give  the  machine  a  high  I/O  data  rate.  The  faster  l/O 
units  can  be  connected  to  more  than  one  I/O  channel  so  more  than  one 
word  could  be  transferred  in  any  one  cycle. 

Each  processor  is  a  simple  three- register  arithmetic  unit  capable  of  per¬ 
forming  the  arithmetic-logic  operations  in  the  instruction  set.  Double¬ 
length  operations  are  performed  by  connecting  two  adjacent  processors. 
The  result  of  each  operation  is  transferred  back  to  the  MPC  immediately 
after  execution,  freeing  the  processor  for  another  instruction  that  may 
come  from  a  different  program;  this  rule  simplifies  the  implementation 
of  interrupt,  multiprogramming,  recovery  from  processor  failure,  and 
other  matters  in  the  machine.  The  number  of  processors  may  be  in  the 
hundreds. 

The  task  level  computer  is  used  tc  implement  a  dynamic  task  priority 
scheme  wherein  each  task  can  be  assigned  a  certain  percentage  of  ma¬ 
chine  capacity  and  is  given  execution  time  at  regular  intervals. 

The  memory  request  sorter  (MRS)  receives  read  and  write  requests  from 
the  I/O  devices  and  processors  orders  them  by  address  and  data  fields, 
and  transfers  them  to  the  memory. 

The  multiprocessor  control  (MPC)  is  the  heart  of  the  machine.  In  one 
sense,  the  MPC  acts  as  a  swi* chbca  rd,  connecting  all  the  various  parts 
of  the  machine  together  *nd  allowing  hundreds  of  data  transfers  to  take 
place  simultaneously.  In  another  sen  e,  it  acts  as  a  flexible  buffer  match¬ 
ing  the  data  rates  in  all  the  data  transfers  !n  still  another  sense,  it  acts 
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as  an  extensive  "instruction  look-ahead"  unit  arranging  for  the  retrieval 
of  instruction  blocks  and  operands,  matching  the  operands  to  their  instruc 
tions,  dispatching  the  instruction-operand  sets  to  processors,  and  storing 
intermediate  results.  The  MPC  is  a  sorting  memory  with  certain  added 
features. 

3.  MEMORY 

The  multiaccess  parallel  merging -separating  memory  is  essentially  that 
describee’,  in  Appendix  VII  with  a  few  modifications.  The  format  of  a  word 
in  memory  is  shown  in  Figure  XV-2. 

The  X  and  Y  fields  designate  six  different  kinds  of  items.  The  memory 
cycle  has  eight  steps: 

1.  Input  new  words,  read  requests,  and  read  and  erase 
requests 

2.  Merge 

3.  Flag  requests  and  associated  memory  words 

4.  Separate  flagged  items 

5.  Present  requested  words  for  output 

6.  Merge 

7.  Flag  requests  and  memory  words  to  be  erased 

8.  Separate  flagged  items  and  erase  them 


AOOWCSS 

*] 

DATA 

TT~ 

!»«! 

hi 

iji 

i  <*> 

'  * 

- . — L _ _ 

Figure  XV-2  -  Memory  Word  Format 
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The  memory  cycle  is  longer  than  described  in  Appendix  VII  because  of  the 
need  for  reading  blocks  of  data  (such  as  instruction  blocks).  With  blocks 
of  data,  there  is  no  convenient  method  for  combining  the  reading  and  the 
erasing  functions;  therefore,  in  Step  5  the  data  are  read  but  not  erased 
and  at  the  end  of  Step  8  the  data  are  erased  by  overwriting  it  with  new  re¬ 
quests  in  Step  1  of  the  next  cycle.  An  approximate  cycle  time  can  be  ob¬ 
tained  by  assuming  150  nsec  per  cl-ck  period  except  during  separates, 
wher°  250  nsec  should  be  assumed.  Using  these  assumptions  and  the 
fact  that  the  even-rumbered  steps  take  n  cl  '■  pulses  (for  a  2n-wc<rd 
memory),  a  cycle  time  of  0.  8ti  +  0.6  y.scc  is  obtained;  for  example,  a 
32,  76C-word  memory  has  a  cycle  time  of  about  12.  6  psec.  In  each  cycle, 
1000  items  or  so  may  be  retrieved. 

There  are  six  types  of  items  in  memory  designated  with  the  following  X 
and  Y  fields; 


X  Y 

0  0000  -  Multiple  read  request  (lower  limit) 

0001  -  Multiple  read  and  erase  request  (lower  Umit) 

0  0010  -  Read  request 

0  0011  -  i.ead  and  erase  request 

0  0100  -  Normal  memory  word 

1  0000  -  Upper  limit 

Each  request  has  a  corresponding  upper-limit. 

At  the  start  o*  a  cycle,  the  MRS  presents  to  fee  lower  pari  of  memory  an 
inverse-ordered  set  of  requests,  upper  limits,  and  new  memory  words 
(Step  1).  During  Step  2,  these  are  merged  with  existing  memory  words. 
During  Step  J,  the  following  items  are  flagged  jv  setting  their  leftmost 
Y  -field  bits; 

1.  Each  request  and  upper  limit 
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2.  Each  memory  word  intervening  between  a  multiple 
request  and  an  upper  limit 

3.  Each  memory  word  that  is  directly  above  a  request 

During  Step  4,  the  flagged  items  are  separated  from  the  unflagged  and 
sent  to  the  lower  part  cf  memory.  During  Step  3,  the  MPC  reads  the 
flagged  items.  If  the  number  of  flagged  items  exceeds  the  channel  ca¬ 
pacity,  the  procedure  is  different:  the  upermost  upper  limit  in  the  chan¬ 
nel  is  picked  as  a  dividing  point  and  it  and  all  words  higher  than  it  have 
their  flags  reset  while  all  other  words  are  read  by  the  MPC  (the  words 
whose  flags  were  reset  will  remain  in  memory  for  the  next  memory  cy¬ 
cle).  During  Step  6,  all  items  are  merged.  During  Step  7,  the  flags  of 
any  unerased  memory  words  are  reset.  During  Step  8,  all  flagged  items 
are  separated  from  the  unflagged,  sent  to  the  lower  part  of  memory,  and 
changed  to  all-zero  memory  words  (0100  in  the  Y  field)  or  overwritten 
with  new  requests  from  the  MRS. 

4.  PROCESSORS 

Each  processor  is  a  three-register  arithmetic  unit  and  an  instruction 
register.  The  instruction  register  contains  the  operation  code  and  task 
identification.  In  each  cycle,  the  MPC-processor  interface  can  transfer 
the  following. 

1.  Operand  A  from  MPC  to  processor 

2.  Operand  B  from  MPC  to  processor 

3.  Operation  code  and  program  identification  (packed 
into  one  word)  from  MPC  to  processor 

4.  Result  of  last  operation  from  processor  to  MPC 

Du  ring  each  cycle,  the  processor  performs  the  indicated  operation. 
Double-length  operations  use  two  adjacent  processors.  One  receives 
the  upper  halves  of  the  operations  and  a  specially  flagged  code  (to  in¬ 
dirate  upper  half),  the  other  receives  the  lower  halves  and  a  specially 
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flagged  operation  code  (to  indicate  lower  ha  \  connection  between  ad¬ 
jacent  processors  is  used  for  the  necessary  interchange  of  data  between 
the  processors.  The  task  identification  is  fed  to  the  task  level  computer 
(see  5.  below)  A  memory  request  is  transferred  to  the  memory  request 
sorter  (see  6.  below). 


5.  TASK  LEVEL  COMPUTER 

When  a  computer  system  is  being  time -shared  by  several  tasks,  a  means 
is  needed  to  transfer  control  between  the  tasks.  The  means  could  be  hard¬ 
ware  or  software  or  a  combination  of  the  tv  a.  In  a  system  with  more  than 
one  processor,  the  implementation  is  complicated  by  the  fact  that  a  given 
task  may  be  using  a  dynamic  number  of  processors;  to  keep  the  processors 
busy,  the  means  for  processor  assignment  should  be  fast. 

In  Machine  II,  all  instructions  ready  for  execution  are  kept  in  a  list  in  the 
MPC  ordered  by  "task  levels.  "  The  task  level  is  a  number  assigned  to  an 
instruction  block  upon  entry  into  the  MPC;  it  governs  the  priority  of  the 
block  relative  to  all  other  blocks  in  the  MPC.  Blocks  with  iower  task  lev¬ 
els  are  preferred  to  those  with  higher  task  levels.  On  each  execution  cy¬ 
cle,  all  processors  interrogate  the  instruction  list;  this  keeps  the  proces¬ 
sors  busy  regardless  of  the  changes  in  any  one  task. 

The  task  level  computer  receives  task  identifications  from  the  processors 
and  uses  this  information  to  keep  track  of  machine  usage  and  to  update  task 
levels.  The  updated  task  levels  effec  t  the  read-in  priorities  of  new  in¬ 
struction  blocks.  The  example  below  illustrates  the  scheme 

Lot  Machine  II  have  four  tasks,  A,  B,  C,  and  L>,  and  suppose  it  is  desired 
to  give  Task  A  SO  percent  of  the  machine  capacity,  Task  B  3Q  percent. 

Task  C  10  percent,  and  Task  D  10  percent,  Give  each  task  an  integer,  A, 
tnnt  is  inversely  related  to  its  desired  capacity.  A  suitable  set  of  A's  in 
this  example  is  A^  *,  A^  -  5,  A^.  *  IS,  A^  I *> .  Ihe  task  identifi¬ 
cation  contains  A  as  a  subfield 
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On  each  execution  cycle,  the  task  level  computer  increments  each  task 
level  by  the  product  of  the  A  for  the  task  and  the  number  of  processors 
used  by  the  task.  This  information  is  contained  in  the  task  identifica¬ 
tions  fed  from  the  processors.  This  operation  causes  the  task  level  for 
a  task  to  increase  at  a  rate  proportional  to  its  current  machine  usage 
and  its  A.  The  task  levels  govern  the  priority  of  the  tasks  in  future  com¬ 
petitions;  this  has  the  effect  of  keeping  the  task  levels  together  since  the 
tasks  with  lower  levels  will  win  future  competitions.,  causing  their  levels 
to  increase  up  to  the  higher  task  levels. 

The  example  illustrates  this.  Let  Machine  II  have  150  processors  and 
assume  that  all  tasks  want  to  use  100  processors  if  given  the  chance.  Ta¬ 
ble  XV  1  shows  the  task  levels  at  successive  execution  cycles  assuming 
a  given  initial  condition.  Here,  Machine  II  is  used  as  follows: 

Task  A,  1050  processor  executions  (50  percent) 

Task  B,  650  processor  executions  (31  percent) 

Task  C,  200  processor  executions  (9-  5  percent) 

Task  D,  200  processor  executions  (9.  5  percent) 

These  percentages  are  close  to  the  desired  percentages  (50,  30,  10,  and 
10).  Because  Task  B  obtained  sLghtly  more  capacity  char,  desired,  its 
task  level  is  higher  so  that  in  future  competitions  it  loses  out.  In  the  long 
run,  the  actual  machine  usage  approaches  the  desired  machine  usage. 

All  processors  were  kept  busy  each  execution  cycle  (there  were  always 
enough  instructions  for  't  to  do),  ihe  machine  usage  approximated  the  de¬ 
sired  machine  usage,  and  every  task  obtained  access  to  the  processors 
once  in  awhile.  Thus,  ti.is  seen  s  like  a  good  assignment  procedure. 

As  time  progresses,  the  task  levels  increase,  to  prevent  overflow,  a  con¬ 
stant  is  subtracted  from  all  task  levels  whenever  the  highest  task  level 
overflows.  The  easiest  constant  to  pick  is  the  power  of  two  tepresented 
by  the  highest  bit  m  the  task  level  Meld. 
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:ABLE  XV- i  -  TASK  LEVELS  AT  SUCCESSIVE  EXECUTION  CYCLES 
ASSUMING  A  GIVEN  INITIAL  CONDITION 


Task  A  (A  =  3) 

Task  B  (A  -  5) 

Task  C  (A  -  15) 

Task  D  (A  =  15) 

Level 

Procei sors 

Level 

Processors 

Processors 

Processors 

100 

100 

200 

50 

300 

400 

i 

400 

50 

450 

300 

550 

i 

450 

50 

1800 

100 

550 

100 

700 

50 

1800 

1900 

850 

100 

950 

50 

1300 

1150 

100 

1200 

50 

1800 

1900  1 

1450 

100 

1450 

50 

1800 

1900 

1750 

50 

1700 

100 

1800 

1900 

1900 

50 

2200 

1800 

100 

2050 

50 

2200 

3300 

1900 

100 

2200 

100 

2200 

50 

3300 

3400 

2500 

50 

2450 

100 

3300 

34C0 

2650 

100 

2950 

50 

3300 

3400 

2950 

100 

3200 

50 

3400 

3250 

3450 

mtmm 

The  task  level  computer  consists  of  a  small  sorting  memory  and  a  set  of 
serial  adders.  It  sort*-  the  task  identifications  arriving  from  the  proces¬ 
sors  and  whenever  two  words  for  the  same  task  are  sorted  together,  an 
adder  adds  the  two  fields  and  erases  one  ol  the  word'  .  Over  several  cy- 
cl  ‘s,  the  necessary  additions  to  ea^h  task  level  word  are  made.  The  task 
level  words  are  periodically  led  to  the  MPC  Note  that  the  task  level  word 
is  not  updated  instantaneously  but  will  usually  lag  behind,  the  effect  cf  this 
is  to  introduce  sonic  "overshoot  m  tin  process  but  th.s  will  not  have  any 
effect  over  the  long  run 
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6.  MEMORY  REQUEST  SORTER 

Because  the  memory  is  a  merging -separating  memory,  the  memory  re¬ 
quests  must  be  presented  to  it  in  ordered  fashion.  The  memory  request 
sorter  (MRS)  gathers  all  memory  requests  from  the  processors  and  I/O 
units  and  orders  them.  The  ordered  set  is  presented  to  the  memory  dur¬ 
ing  Step  1  of  its  cycle. 

Each  write  request  is  one  word  with  the  format  shown  in  Figure  XV-3. 
When  this  is  put  in  the  memory,  it  will  act  as  a  normal  memory  word. 
Each  read  request  consists  of  two  words,  an  upper  limit  and  a  lower 
limit.  The  lower  limit  format  shown  in  Figure  XV-3  is  where  Y  is  the 
code  for  the  particular  type  of  request  (see  Item  3)  and  the  threshold  is 
all  zeros  except  for  a  threshold  search  operation.  The  upper  limit  for¬ 
mat  shown  in  Figure  XV-3  is  where  the  MPC  information  indicates  where 
the  data  retrieved  by  the  *  quest  should  go  in  the  MPC. 


tRITE  REQUEST 


MEMORY  ADDRESS 

0 

D*.TA 

0100 

(24) 

m 

(32) 

(4) 

REAP  REQUEST,  l.OIER  LIMIT 


MEMORY  ADDRESS 

0 

THRESHOL  d 

Y 

(24) 

h 

(32) 

(4) 

RE  ip  REQUEST.  UPPER  LIMIT 


MEMORY  aooress 

0 

MPC  iSfOUMA)  ION 

T^' 

1241 

(1) 

( 32) 

Figure  XV-3  -  Memory  Request  Formats 
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7.  MULTIPROCESSOR  CONTROL 
a.  General 

The  multiprocessor  control  (MPC)  consists  of  a  sorting  memory  with  logic 
between  adjacent  words  to  cause  certain  changes  in  the  words.  There  are 
three  kinds  of  interfaces  with  the  MPC:  I/O  devices,  processors,  and 
memory  channels.  The  uppermost  end  of  the  MPC  is  the  I/O  region  with 
each  I/O  device  connected  to  one  word  in  the  region.  Immediately  below 
the  I/O  region  is  the  processor  region  with  each  processor  connected  to 
three  consecutive  words  in  the  region.  The  lowermost  end  of  the  MPC  is 
the  memory  region  with  each  memory  channel  connected  to  three  consecu¬ 
tive  MPC  words. 

The  MPC  cycle  consists  of  a  sort  phase  during  which  all  MPC  words  are 
sorted,  and  a  transfer  phase  during  which  the  interfaces  read  and/or  write 
into  their  corresponding  words  and  certain  words  are  interpreted  and  modi¬ 
fications  made. 

The  following  five  kinds  of  words  are  in  the  MPC:  a,  0,  y,  8  ,  and  £  ,  with 
the  formats  as  shown  in  Figure  XV -4.  During  the  transfer  phase,  these 
words  are  interpreted  as  follows. 

q  word:  If  the  word  above  an  q  word  is  an  f  word  with  the 
same  A  field,  then  move  the  F  field  of  the  q  word  into  the  A 
and  B  fields  of  the  same  word  and  copy  the  C  and  H  fields  of 
the  f  word  into  the  C,  F.  and  G  fields.  The  i  word  is  undis¬ 
turbed  while  the  q  word  is  changed  to  an  f  word  with  new  A, 

B.  arid  H  fields.  If  the  word  above  is  not  an  [  word  or  does 
not  have  the  same  A  field,  the  q  word  is  undisturbed. 

,.i  word:  Same  as  the  q  word  except  that  the  f  word  is  erased 
(C  -  100  and  all  other  fields  cleared  to  zeros). 

\  v. o rd :  If  the  two  -*o'ds  above  a  \  word  are  f  words  with  the 
s»n.e  A  fields,  set  the  left-mosi  bit  of  the  y  word  andthe  two  £ 
words  to  1,  otherwise,  leave  all  words  alone. 
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Figure  XV-4  -  Multiprocessor  Control  Word  Formats 

8  word:  If  the  word  above  a  8  word  is  an  t  word  with  the 
same  A  field,  its  left-most  bit  is  set  to  1  and  the  8  word  is 
erased  (C  =  100  and  all  other  fields  cleared  to  zeros);  other¬ 
wise,  set  the  left-most  bit  of  the  8  word  to  1. 

[  wc.d:  Not  interpreted  except  in  relation  to  adjacent  a, 
y,  or  8  words. 

The  set  of  words  in  the  MPC  is  divided  into  seven  regions.  The  size  of 
these  regions  varies  with  time  and  one  or  more  of  them  may  be  empty  at 
a  particular  time.  The  left  most  three  bits  of  each  word  indicates  the 
region  it  is  in.  The  regions  are  listed  below  with  the  three-bit  codes. 

Ill-  I/O  region 
110  -  Processor  region 
101  -  Result  region 
Oil  -  I/O  buffer  -^gion 
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010  -  Instruction  region 
001  -  Pointer  region 
000  -  Memory  region 

As  described  above,  the  MPC  interfaces  are  connected  to  the  I/O  region, 
processor  region,  and  memory  region.  The  size  of  the  I/O  region  is  fixed. 

The  operation  of  the  MPC  can  be  described  by  showing  the  actions  that  oc¬ 
cur  for  words  retrieved  from  memory  (words  from  memory  may  be  words 
for  output  or  instructions  or  data),  words  from  I/O  devices,  and  words 
from  processors. 

b.  Outout  Words 

—  !!■■■  «...  »  . -I 

An  output  device  requests  a  block  of  consecutive  words  from  memory  by 
putting  a  read  request  or  read  and  erase  request  in  the  MRS.  The  upper 
limit  of  the  request  contains  the  output  device  code.  When  the  block  ap¬ 
pears  on  the  memory  interface  to  the  MPC,  an  £  word  is  written  for  each 
word.  The  upper  and  lower  limits  become  erased  words  while  every  word 
in  between  has  the  output  device  code  preceded  by  the  I/O  buffer  region 
code  (011)  written  in  the  Afield.  The  24-bit  memory  address  and  the  32- 
bit  data  field  of  this  word  are  put  in  the  B  and  H  fields.  These  MPC  words 
travel  to  the  I/O  buffer  region  in  the  next  sort  phase. 

The  J/O  buffer  region  is  ordered  by  I/O  device  number,  memory  address, 
and  data  field. 

c.  Channel  Words 

Every  I/O  channel  (whether  the  I/O  device  is  operating  or  not)  inserts  into 
its  corresponding  MPC  word  in  the  I/O  region  a  fixed  5  word  with  the  I/O 
buffer  region  code  (Oil)  and  the  I/O  device  code  m  the  A  field.  This  word 
travels  to  the  I/O  buffer  region  and  either  sends  back  the  least  word  m  tin- 
device  buffer,  if  there  n  one.  or  lends  back  itself  if  there  isn’t  one.  In 
h;a  way,  each  output  device  reads  its  own  buffer. 
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I/O  units  are  started  by  putting  specially  flagged  control  words  in  their 
buffers. 

d.  Instructions 

The  upper  limit  of  an  instruction-block  read  request  contains  the  MPC 
block  assignment  for  the  block,  the  program  ID,  and  the  MPC  address 
of  the  start  instruction  that  caused  it  to  be  read  in.  When  an  instruction 
block  arrives  over  the  memory  -  MPC  interface,  each  word  causes  three 
MPC  words  to  be  formed.  Two  are  operand  requests  and  one  is  the  OP 
code  -  program  ID  word  (dummy  words  are  formed  in  place  of  operand 
requests  for  instructions  that  have  less  than  two  operands).  The  operand 
request  format  is  in  Figure  XV-5, 

X  is  either  a  zero  or  a  one  depending  on  whether  the  operand  should  be 
erased  or  not.  The  operand  request  is  an  a  or  a  ji  word  so  that  when  the 
desired  operand  appears  in  the  result  region  it  is  copied  and  the  operand 
request  sent  to  the  instruction  region.  The  OP  code-program  ID  word  is 
a  y  word  so  that  when  the  two  operand  requests  return  to  it,  all  three  are 
sent  to  the  processor  region. 

When  a  new  instruction  block  is  read  in,  a  pointer  word  containing  the 
MPC  block  address'  and  address  of  the  start  instruction  is  put  in  the 
pointer  region.  This  is  used  by  SPB  and  SPR  operations  to  find  operands. 
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Figure  XV-5  •  Operand  Request  Format 
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e.  ^ata 

Data  requests  are  Sc.nl  to  the  result  region.  The  upper  limit  contains  the 
MPC  address. 

f .  Processor  Results 

Results  of  instructions  are  sent  to  the  resuit  region  addressed  appropri  - 
ately. 

&  Summary 

This  describes  the  MPC.  Generally  speaking,  £  words  contain  data  while 
a,  ft,  y,  and  5  words  act  as  data  requests.  The  I/O  region  is  fixed  in 
length  by  guaranteeing  a  fixed  number  of  words  with  the  I/O  region  code 
(il  a  5  word  finds  nothing  to  send  to  the  I/O  region,  it  sends  itself). 

An  MPC  of  8192  words  requires  91  steps  (l/2  X  13  X  14  -  91)  in  its  sort 
phase  and  i  step  in  its  transfer  phase.  At  150  nsec  per  step,  the  MPC 
cycle  is  13.  8  usee.  good  assumption  to  time  out  example  problems  then 
is  li.8  usee  per  MPC  cycle.  Figure  XV-6  shows  the  timing  charts. 

8.  CONCLUSIONS 

The  various  parts  of  Machine  II  have  been  described.  The  main  differ¬ 
ence  between  it  and  Machine  I  is  the  multiprocessor  control  (MPC),  which 
allows  automatic  dynamic  processor  assignments,  the  ability  to  code 
parallel  programs  without  specifically  assigning  new  processors,  and 
(he  ability  to  crosstalk  between  parallel  programs.  This  enhances  the 
efficiency  of  the  machine  tn  many  programs. 
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APPENDIX  XVI  -  PARALLEL  NONNUMERIC  PROCESSING 


1.  INTRODUCTION 

Nonnumcric  processing  is  discussed  in  general  along  with  the  characteris¬ 
tics  that  are  present  in  present-day  machines  and  those  characteristics 
that  are  desirable  in  a  parallel  nonnumeric  processor.  Ways  of  imple¬ 
menting  these  Characteristics  by  means  o£  sorting  memories  are  discussed. 
The  detailed  design  of  a  parallel  nonnumeric  processor  awaits  further  study. 

2.  NONNUMERIC  PROCESSING 

The  words  "numeric"  and  "nonnumeric"  when  applied  to  data  processing 
problems  are  misnomers.  A  look  at  typical  numeric  and  nonnumeric  prob¬ 
lems  reveals  the  distinguishing  characteristic  -  the  addressing  of  data.  In 
a  typical  numeric  problem,  most  items  of  data  are  addressed  by  their 
unique  labels  (addresses);  this  can  be  called  "explicit  addressing."  In  a 
typical  nonnumeric  problem,  most  items  of  data  are  addressed  by  their 
properties;  this  can  be  called  "implicit  addressing."  This  can  be  seen 
when  a  typical  numeric  programming  language,  such  as  FORTRAN,  in 
which  each  item  is  referred  to  by  a  unique  label,  is  compared  with  a  typi¬ 
cal  nonnumeric  programming  language,  such  as  for  list  processing,  in 
which  a  typical  operation  might  be  the  searching  of  a  list  structure  for  a 
set  of  items  meeting  a  given  pattern. 

3.  CLASSES  OF  PROPERTIES 

In  general,  the  properties  by  which  data  are  implicitly  addressed  fall  into 
three  classes: 

1.  Ary  property  dependent  on  an  item  of  data  per  se;  for  ex¬ 
ample,  ti  e  property  of  being  greater  than  or  less  than  a 
threshold  or  the  property  of  having  certain  of  ite  bit# 
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matching  a  pattern.  This  class  usually  is  called  content  - 
addressing. 

2.  Any  maximum  or  minimum  property  such  as  the  property 
of  being  the  largest  or  smallest  item  In  a  set.  This  is  re- 
ferred  to  here  as  limit-addressing. 

3.  Any  property  dependent  on  "neighborhoods."  When  these  oc¬ 
cur  in  a  nonnumeric  problem,  there  is  a  structure  (topology) 
imposed  on  a  set  of  data  such  as  lists,  trees,  matrices,  list 
structures,  etc.  A  typical  property  by  which  items  may  be 
addressed  might  be  the  satisfying  of  a  subpattern.  This  is 
referred  to  here  as  structure-addressing. 


Properties  from  more  than  one  of  these  classes  may  be  used  in  a  single 
search.  For  instance,  one  of  the  search  patterns  mentioned  in  Appendix  X 
is  a  string  of  five  items  (structure-addressing),  the  first,  third,  and  fifth 
of  which  are  operators  and  the  second  and  fourth  of  which  are  variables 
(content -addressing)  and  in  which  the  precedence  of  the  third  item  is 
greater  than  that  of  the  first  item  and  net  Jess  than  that  of  the  fifth  item 
(limit-addressing).  These  properties  are  separated  into  these  classes 
because  the  implementations  of  searches  for  properties  usually  differ. 


4.  SOME  PRESENT-DAY  NONNUMERIC  PROCESSORS 

Most  conventional  computers  are  capable  only  of  explicit  addressing  of 
data.  A  few  (the  CDC-1604,  for  example)  can  perform  equality  search  or 
threshold  search  operations  by  which  a  contiguous  table  in  memory  can  be 
content-addressed;  these  operations  search  sequentially  and  thus  art  prac¬ 
tical  only  for  small  tables.  To  make  the  solution  of  some  nonnumeric  prob¬ 
lems  more  amenable  on  a  conventional  computer,  a  number  of  languages 
are  available  of  which  LISP,  IPL-V,  and  SNOBOL  are  examples.  In  essence, 
these  languages  arrange  the  storage  of  data  mors  efficiently  so  that  struc¬ 
ture-addressing  is  easier;  link  fields  in  items  represent  the  neighborhoods. 
The  amount  of  time  spent  in  housekeeping  in  these  programs  lowers  their 
potential  to  small  nonnumeric  problems. 


-392- 


APPENDIX  XVI 


Content-addressing  memories  (CAMs)  can  perform  content-addressing 
very  well  since  all  of  memory  is  interrogated  at  the  same  time.  By  add¬ 
ing  a  fast  facility  to  indicate  the  presence  or  nonpresence  of  responses, 
limit-addressing  (maximum  and  minimum  searches)  also  is  performed 
very  well.  Structure  addressing  can  be  added  with  multiple  comparands 
(see  Item  5,  below).  Single -comparand  CAM's  might  require  long  times 
to  do  certain  structure -addressing  problems. 

If  the  problems  to  be  solved  are  limited  to  those  with  a  certain  topology, 
the  response  store  of  a  CAM  could  be  interconnected  in  that  topology  and 
a  machine  obtained  that  would  solve  problems  in  that  class  very  well.  Two 
machines  with  this  organization  are  the  Illiac  III  at  the  University  of  Illi¬ 
nois^’  a  and  the  SOLOMON.^  Both  of  these  have  the  topology  of  a  square 
array.  On  problems  that  fit  the  square  array,  these  machines  do  very 
well  while  on  other  problems  they  lose  much  of  their  speed.  There  are 
many  different  topologies  present  in  nonnumeric  problems;  for  example, 
lists,  list  structures,  trees,  arrays,  and  graphs.  In  many  problems,  the 
topology  changes  as  computation  proceeds,  hence  a  machine  with  a  fixed 
topology  will  be  limited  in  purpose.  The  topology  of  any  practical  non¬ 
numeric  problem  can  be  represented  by  a  graph  with  weighted  directed 
links;  nodes  represent  items,  and  links  represent  the  connections  or  re¬ 
lations  between  neighboring  items  (the  link  weight  shows  the  kind  of  rela¬ 
tion).  As  is  shown  under  Item  5  below,  content-addressing  can  be  changed 
to  structure -addressing  so  an  organisation  based  on  graphs  will  have  great 
utility. 

5.  CONTENT -ADDRESSING  BY  STRUCTURE- ADDRESSING 

Given  a  processor  capable  of  representing  any  topology,  one  can  imple¬ 
ment  content-addressing,  The  technique  is  to  separate  each  item  into  its 
separate  fields  and  connect  the  fields  by  weighted  links  to  show  where  they 


^Superior  numbers  in  the  text  refer  to  items  ir  the  List  of  References  under 
Item  11,  Page  40S, 
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occur,  and  then  coalesce  any  equal-valued  items.  Each  item  contains  only 
one  field  and  its  value  is  unique  so  its  value  can  be  used  as  a  label  or  ad¬ 
dress  by  which  it  can  be  explicitly  addressed.  The  example  that  follows 
exhibits  this  technique.  Suppose  there  are  the  following  eight  3-field  items 
to  be  content-addressed: 

A  1  3 

B  2  1 

C  1  2 

D  3  3 

E  2  2 

F  li 

G  3  1 

H  2  1 

The  three  fields  of  each  item  are  separated  and  connected  by  a  link  of 
Weight  2  between  the  first  and  second  fields  and  a  link  of  Weight  3  between 
the  first  and  third  fields.  Then  all  equal-valued  items  are  coalesced.  The 
resulting  graph  is: 


A  content -address  search  for  those  items  whose  second  and  third  fields  are 
2  and  l.  respectively,  is  transformed  to  a  pa  search  for 
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Any  content-address  search  can  be  similarly  transformed. 


6.  STRUCTURE-ADDRESSING  BY  CONTENT- ADDRESSING 

Given  a  multiple -comparand  content-addressable  memory,  one  can  imple¬ 
ment  structure-addressing  on  it;  one  stores  a  word  for  each  link  of  the 
graph  containing  the  initial  node  label,  the  link  weight,  and  the  terminal 
node  label. 

As  an  example,  the  graph  previously  shown  could  be  stored  in  a  CAM  as 
follows: 


Initial 

Link 

Terminal 

node 

weight 

node 

A 

2 

1 

A 

3 

3 
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2 
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1 
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The  pattern  search 


©— s-O-J— 0 


could  be  implemented  as  follows: 

1.  Find  all  words  with  2,  2  in  their  second  and  third 
fields  (three  responses:  B,  2,  2, ;  E,  2,  2;  H,  2,  2). 

2.  Form  a  comparand  for  each  response  whose  first 
field  is  the  first  field  of  the  response  and  whose 
second  and  third  fields  are  3  and  1  respectively 
(three  comparands:  B,  3,  1;  E,  3,  1;  H,  3,  1). 

3.  Find  all  words  that  agree  with  one  of  these  compa¬ 
rands  (two  responses:  B,  3,  ;  H,  3,  1). 

B  and  H  satisfy  the  pattern  search  (the  first  fields  of  the  responses  to 
Step  3). 

This  example  shows  the  use  of  multiple  comparands.  In  general,  they 
will  be  required  in  many  structure  searches,  being  formed  from  the  re¬ 
sponses  of  one  step  of  the  search  for  use  in  a  later  step.  If  the  data  struc¬ 
ture  is  large,  there  may  be  many  comparands  in  some  step  of  a  search:  a 
single -comparand  CAM  can  only  treat  these  one  at  a  time  and  may  become 
unduly  alow. 

A  machine  organisation  using  a  single -comparand  CAM  for  structure  ad- 
dressing  it  the  Association-Storing  Processor.  Since  only  one  comparand 
ia  permitted  at  a  time,  the  search  algorithm  involves  a  "backtrack1*  pro¬ 
cedure,  that  ta.  a#ter  any  step  it  treats  one  of  the  responses  in  the  next 
step,  carrying  it  to  completion,  and  then  treats  the  others  in  later  steps. 
The  time  spent  in  a  given  search  depends  strongly  on  tht  complexity  ot 
the  data  structure  being  searched:  some  structures  may  generate  numer¬ 
ous  responses  and  hence,  numerous  backtrackings. 
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From  these  thoughts,  the  desirability  of  a  multiple  •comparand  content- 
addressed  memory  car  be  seen. 

7.  A  SORTING  MEMORY  AS  A  MULTICOMPARAND  CAM 

a.  General 

One  way  to  build  a  multicomparand  CAM  is  to  use  multiple  response 
stores,  one  for  each  comparand.  The  response  store  in  a  CAM  is  a 
major  cost  item  and  thus  this  solution  is  uneconomical.  Another  way 
is  to  use  a  sorting  memory  (Append.  :  VI).  This  has  the  advantage 
that  the  cost  increment  is  small  as  comparands  are  added.  One  limi¬ 
tation  is  that  only  searches  on  the  left  parts  of  words  can  be  performed; 
proper-organization  of  data  removes  the  effect  of  this  limitation. 

b.  Main  Section 

A  sorting  memory  used  for  multiple -comparand  content-addressing 
has  11  different  words  in  its  main  section.  Their  formats  are  shown  in 
Figure  XVI-1.  The  leftmost  bitF  of  these  words  are  0.  The  high  end  of 
the  sorting  memory  is  the  readout  section  and  contains  words  with 
leftmost  bits  equal  to  1.  The  readout  section  is  discussed  in  Item  c 
below. 

Empty  words  contain  all  zeros  except  for  two  bits  as  shown  in  Figure 
XVI-l.  Their  magnitude  is  less  than  that  of  any  other  word  in  memory 
and  thus  they  collect  at  the  low  end  of  memory.  The  low  end  is  used 
for  input  so  the  empty  words  are  overwritten  with  new  data  or  oper¬ 
ations  through  the  input  lines. 

Each  link  of  the  data  structure  is  represented  by  two  link  words  •  a 
forward  word  and  a  backward  word.  The  node  labels  are  interchanged 
and  the  bit  between  the  A  and  B  fields  is  changed  from  0  to  1  in  the 
backward  word.  Because  of  the  sorting  action,  the  forward  words  of 
all  links  leaving  a  given  node  are  collected  and  ordered  by  their  weights. 
Similarly,  the  backward  words  of  all  links  entering  the  node  ere  col¬ 
lected  in  a  set  adjacent  to  the  forward  word*  of  link#  leaving  the  node. 
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Operations  1  and  2  (Figure  XVI- 1)  destroy  a  word  with  a  given  link  § 

weight  and  node  labels.  They  are  characterized  by  a  1  between  the  J 

B  and  C  fields  and  a  0  in  the  rightmost  position.  The  bit  between  the 
A  and  B  fields  indicates  which  operation  is  to  be  done,  forward  word  ! 
destruction  or  backward  word  destruction.  In  either  case,  the  sort¬ 
ing  action  sends  the  operation  word  to  a  location  just  below  the  word 
to  be  destroyed.  Circuitry  in  the  memory  detects  the  existence  of 
the  operation  word  and  changes  it  and  its  corresponding  link  word 
to  empty  words.  If  no  link  word  corresponds,  the  operation  word 
alone  is  destroyed. 

Operations  3  and  4  read  ail  the  links  on  a  given  node;  A  0  between 
the  B  and  C  fields  and  a  II  in  the  two  leftmost  C-fieid  positions 
characterize  these  operations.  The  sorting  action  sends  these  oper¬ 
ation  words  to  a  iocation  just  below  the  links  of  the  given  nod**.  Cir¬ 
cuitry  in  the  memory  detects  the  presence  of  one  of  these  operation 
words  and  causes  the  following  action; 

i.  The  C-ficld  contents  of  the  operation  word  re¬ 
places  the  \  field  of  ail  corresponding  link 
words. 


2.  The  A-  and  C-field  contents  of  the  operation 
word  are  interchanged. 

3.  The  lenmost  bits  ot  the  link  ».ord*  and  oper¬ 
ation  word  *> re  chanped  from  0  to  1. 

This  action  causes  the  operation  word  and  link  words  to  travel  to  the 
readout  section  during  the  neat  sort  cycle.  The  link  words  are  sent 
back  to  the  mam  settton  alter  readout  it  and  only  ;i  the  rightim. 
operation  word  b't  is  1  (Operation  1  rather  than  Operation  41. 

Operations  $  6.  7.  and  8  re#J  ail  links  wits  a  given  weight  and  di¬ 
rective  mcit'ent  on  s»  given  node.  A  0  between  she  &  and  C  Helds 
and  a  Ot  m  the  two  leftmost  C-lteSd  positions  farisnte  these 
operation  words.  The  sorting  action  sends  the  operation  words 
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to  a  location  just  below  the  links  to  be  read.  Circuitry 
in  memory  detects  their  presence  and  causes  exactly  the  same 
action  as  that  for  Operations  3  *nd  4;  only  the  link  words  with  the  de¬ 
sired  link  weight  are  treated. 

c .  Readout  Section 

As  indicated  in  b  above,  Operations  3  through  8  cause  the  leftmost 
bits  of  certain  words  to  be  changed  from  0  to  1;  during  the  succeeding 
sort  cycle,  these  words  arrive  at  ihe  high  end  of  memory  (the  read¬ 
out  section).  At  any  time,  each  ox  the  operation  words  has  a  unique 
control  so  that  no  intermingling  of  responses  between  concurrent 
operations  can  occur  (the  control  is  put  in  the  A  field  as  discussed 
in  b  above). 

Readout  lines  connected  to  the  high  end  of  memory  read  the  contents 
of  the  readout  section  after  which  circuitry  in  memory  causes  the 
following  action: 

1.  Any  link  word  associated  with  an  operation  word 
whose  rightmost  bit  is  0  is  overwritten  with  an 
empty  word  (this  destroys  this  link  word). 

2.  The  A-field  of  any  link  word  associated  with  an 
operation  word  whose  rightmost  bit  is  1  is  re¬ 
placed  by  the  C  field  of  the  operation  word  (this 
was  its  original  A  field)  and  the  leftmost  bit  of 
this  link  word  is  •.hanged  back  to  0  (ti:vs  puts  it 
back  in  its  original  state). 

3.  Any  operation  word  is  overvr’tter.  with  an  empty 
word. 

In  the  succeeding  sort  cycle  all  undestroyed  link  wordr  turn  to  thnir 
former  positions  in  the  main  section  of  memory. 

d.  Conflicts  between  Operation  Words 

There  is  a  possibility  that  more  than  one  operation  word  v  ints  to 
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affect  the  sane  link  words.  These  conflicts  are  detected  in  the  main 
section  and  resolved  with  the  following  rules: 

1.  Operations  1  and  2  take  precedence  over  the  other 
operations.  Any  other  operation  can  still  read 
any  link  not  being  destroyed  by  an  Operation  1 

or  2. 

2.  Operations  5,  6,  7,  and  8  take  precedence  over 
Ooerations  3  and  4.  Otherwise,  conflicts  are 
rv,ioIved  in  favor  of  the  operation  word  with 
highest  control  field.  Any  operation  word  losing 
out  to  another  by  this  rule  is  "delayed"  as  dis¬ 
cussed  below. 

An  operation  word  losing  a  conflict  by  Rule  2  is  delayed  by  changing 
the  1  in  its  second  leftmost  C -field  position  to  a  0.  The  word  remains 
ii:  this  state  during  the  succeeding  sort  cycle.  At  the  end  of  this  cycle, 
the  bit  is  changed  back  to  a  1  but  no  other  action  occurs  until  the  end 
of  the  following  sort  cycle  at  which  time  the  operation  is  tried  again 
(at  this  time  any  undestroyed  links  that  were  sent  to  the  readout  sec¬ 
tion  by  the  operation  winning  the  conflict  have  been  returned  to  their 
original  state). 

Operations  3  through  8  cause  link  words  to  be  absent  from  the  main 
section  for  two  sort  cycles.  A  simple  rule  can  prevent  the  possibility 
that  an  operation  word  arrives  in  the  main  section  while  the  links  it 
wants  are  in  the  readout  section.  The  rule  is  to  input  operation  words 
with  odd  A  fields  only  during  alternate  cycles  and  operation  words 
with  even  A  fields  only  during  the  intervening  cycles. 

The  restriction  mentioned  in  the  foregoing  paragraph  can  be  removed 
by  the  addition  of  "place  marker"  words  to  those  of  Figure  XVI-1.  Such 
words  would  remain  jn  the  main  section  in  place  of  those  links  sent 
to  the  readout  section. 
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8.  A  PARALLEL  NONNUMERIC  PROCESSOR 

A  parallel  nonnumeric  processor  could  be  constructed  using  the  multiple- 
comparand  content-addressed  sorting  memory  described  in  Item  7.  Fig¬ 
ure  XVI-2  shows  a  block  diagram  of  such  a  processor 

In  general,  the  processing  unit  sends  operation  words  and  new  link  words 
to  the  memory  and  receives  responses  in  return.  The  control  fields  origi¬ 
nally  entered  in  the  operation  words  wind  up  in  the  responses  so  lhat  no 
ambiguity  occurs  even  though  many  different  operation  words  may  be  pres¬ 
ent.  The  control  fields  are  used  in  the  processing  unit  to  send  the  re¬ 
sponses  to  the  correct  locations.  Inpjt  and  output  channels  communicate 
with  the  processing  unit.  These  can  be  handled  in  a  manner  similar  to 
that  described  in  Appendix  XV. 

Further  development  of  the  processing  unit  is  dependent  on  development 
of  general-purpose  structure -search  algorithms.  The  basic  form  for  an 
algorithm  that  treats  both  searches  with  loops  and  "loop-free"  searches 
is  discussed  below.  Arithmetic  and  other  operations  need  also  be  included 
to  obtain  a  useful  machine. 
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Figure  XVI-2  -  A  Parallel  Nonnumeric  Processor 
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9.  ALGORITHM  FOR  PARALLEL-STRUCTURE  SEARCHES 

This  algorithm  will  search  any  data  structure  for  a  subpattern  that  meets 
the  following  conditions: 

1.  All  link  weights  in  the  search  pattern  are  constant 
(have  known  weights),  and 

2.  At  least  one  node  in  the  search  pattern  is  constant 
(has  a  known  label). 

The  algorithm  works  in  ''parallel,  "  treating  all  possible  search  candidates 
simultaneously.  The  time  in  most  cases  is  proportional  to  the  number  of 
links  in  the  search  pattern  unless  storage  limits  are  reached. 

The  algorithm  produces  a  set  of  n-tuples  where  n  is  the  number  of  vari¬ 
ables  in  the  search  pattern.  Depending  on  the  implementation  of  the  al¬ 
gorithm,  the  n-tuples  might  be  stored  as  ordered  sets  of  words  in  the 
processing  unit  or  the  n-tuples  might  be  represented  in  memory;  for  ex¬ 
ample,  a  new  node  for  each  n-tuple  connected  to  a  fixed  node  and  to  all  of 
its  members  by  links  with  certain  weights. 

We  assume  the  variable  nodes  of  the  search  pattern  are  labelled  by  X., 

X^.  ....  Xn<  The  ith  term  (for  1  =  i  =  n)  of  each  final  n-tuple  will 
contain  the  node  label  of  X.  in  the  subpattern  corresponding  to  the  n-tuple. 

With  no  loss  of  generality,  it  can  be  assumed  that  for  any  pair  of  variables 
in  the  search  pattern  there  exists  at  least  one  path  between  them  incident 
only  on  variable  nodes.  If  this  condition  is  not  met,  then  the  search  pat¬ 
tern  can  be  split  along  some  of  its  constants  into  two  or  more  disconnected 
pieces;  each  of  the  pieces  meets  the  condition  and  can  be  treated  inde¬ 
pendently  of  the  other  pieces.  Furthermore,  any  link  between  constants 
is  redundant  and  can  be  removed. 

The  only  housekeeping  required  is  a  method  of  marking  treated  links  in 
the  search  pattern  to  distinguish  them  from  untreated  links. 

The  algorithm  is  as  follows: 
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Step  1.  Pick  any  link  in  the  search  pattern  incident  on  a 
constant.  The  other  node  of  the  link  is  a  variable,  say 
X..  Depending  on  link  direction,  send  an  Operation  5  or  7 

word  to  memory  {see  Figure  XVI-1)  using  the  constant  and 
link  weight  in  fields  A  and  B,  respectively.  The  re¬ 
sponses  are  candidates  for  X^.  Form  an  n-tuple  for 

each  response  with  the  response  node  label  as  its  ith 
member.  Mark  the  search  pattern  link  as  being  treated. 
Go  to  Step  2. 

Step  2.  Is  there  any  untreated  link  in  the  search  pattern 
between  a  constant  and  a  variable  incident  on  a  treated 
link?  If  so,  go  to  Step  3;  otherwise,  go  to  Step  4. 

Step  3.  Let  the  untreated  link  of  Step  2  have  link  weight 
W,  constant  node  C,  and  variable  node  X^.  Depending  on 

link  direction  send  an  Operation  5  or  7  word  to  memory 
with  C  and  W  in  fields  A  and  B,  respectively.  Compare 
the  responses  to  the  ith  members  of  all  n-tuples  and  de¬ 
stroy  any  n-tuple  whose  ith  member  does  not  correspond 
to  any  response.  Mark  the  link  as  being  treated.  Go  to 
Step  2. 

Step  4.  Is  there  any  untreated  link  in  the  search  pattern 
between  two  variable  nodes,  each  of  which  is  incident  on 
some  treated  link?  If  so,  go  to  Step  5;  otherwise,  go  to 
Step  6, 

Step  5.  Let  the  untreated  link  have  weight  W,  initial  node 
XL,  and  terminal  node  X^»  For  each  n-tuple,  send  an 

Operation  5  word  to  memory  with  its  ith  member  in  the 
A  field  and  W  in  its  B  field  and  discard  the  n-tuple  if  its 
jth  member  does  not  agree  with  any  of  the  responses. 
Mark  the  link  as  being  treated.  Go  to  Step  4. 

Step  6.  Are  all  links  in  the  search  pattern  treated?  If 
so,  the  algorithm  is  complete;  if  not,  pick  an  untreated 
link  one  of  whose  nodes  is  incident  on  a  treated  link  and 
go  to  Step  7, 


Step  7.  Let  the  untreated  link  have  weight  W.  Let  the 
ae  incident  on  a  treated  link  be  X,  and  let  the  other 


no 


node  be  X.. 


Depending  on  link  direction,  send  an  Oper¬ 


ation  5  or  7  word  to  memory  for  each  n-tuple.  The  A 
field  of  the  operation  word  is  the  ith  member  of  the  n- 
tuple  and  the  B  field  is  W.  If  the  operation  word  for  the 
n-tuple  has  m  responses,  replicate  the  n-tuple  m  times 
and  put  one  response  in  the  jth  member  of  each  copy, 
Mark  the  link  as  being  treated.  Go  to  Step  2. 
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This  algorithm  can  be  halted  after  Step*  I,  3,  5,  or  7  if  no  n-tuples  are 
present.  This  condition  mean*  that  no  subpattern  of  the  data  structure 
matches  the  search  pattern. 

10.  CONCLUSIONS 

This  appendix  has  discussed  nonnumeric  processing  in  general  and  charac¬ 
teristics  that  are  present  in  current  machines  and  also  those  characteris¬ 
tics  desirable  in  a  parallel  nonnumeric  processor.  It  was  shown  that  a 
multicomparand  CAM  exhibits  these  desirable  characteristics.  It  was 
further  shown  that  a  sorting  memory  can  serve  as  an  implementation  of 
a  multicomparand  CAM.  A  general  form  for  a  parallel  nonnumeric  proc¬ 
essor  was  described.  A  basic  search  algorithm  was  presented  for  parallel 
structure  searches. 

Time  limitations  prevented  development  of  the  detailed  processor  design. 

A  general  search  algorithm  should  also  be  developed.  However,  study  to 
date  indicates  that  a  machine  patterned  after  the  organisation  in  this  re¬ 
port  would  be  capable  of  solving  large  nonnumeric  problems  significantly 
faster  than  other  existing  schemes. 
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